6,664 research outputs found

    Evaluating Music Recommender Systems for Groups

    Full text link
    Recommendation to groups of users is a challenging and currently only passingly studied task. Especially the evaluation aspect often appears ad-hoc and instead of truly evaluating on groups of users, synthesizes groups by merging individual preferences. In this paper, we present a user study, recording the individual and shared preferences of actual groups of participants, resulting in a robust, standardized evaluation benchmark. Using this benchmarking dataset, that we share with the research community, we compare the respective performance of a wide range of music group recommendation techniques proposed in theComment: Presented at the 2017 Workshop on Value-Aware and Multistakeholder Recommendatio

    Unsupervised Learning of Parsimonious General-Purpose Embeddings for User and Location Modelling

    Full text link
    Many social network applications depend on robust representations of spatio-temporal data. In this work, we present an embedding model based on feed-forward neural networks which transforms social media check-ins into dense feature vectors encoding geographic, temporal, and functional aspects for modelling places, neighborhoods, and users. We employ the embedding model in a variety of applications including location recommendation, urban functional zone study, and crime prediction. For location recommendation, we propose a Spatio-Temporal Embedding Similarity algorithm (STES) based on the embedding model. In a range of experiments on real life data collected from Foursquare, we demonstrate our model's effectiveness at characterizing places and people and its applicability in aforementioned problem domains. Finally, we select eight major cities around the globe and verify the robustness and generality of our model by porting pre-trained models from one city to another, thereby alleviating the need for costly local training

    Embedding Electronic Health Records for Clinical Information Retrieval

    Full text link
    Neural network representation learning frameworks have recently shown to be highly effective at a wide range of tasks ranging from radiography interpretation via data-driven diagnostics to clinical decision support. This often superior performance comes at the price of dramatically increased training data requirements that cannot be satisfied in every given institution or scenario. As a means of countering such data sparsity effects, distant supervision alleviates the need for scarce in-domain data by relying on a related, resource-rich, task for training. This study presents an end-to-end neural clinical decision support system that recommends relevant literature for individual patients (few available resources) via distant supervision on the well-known MIMIC-III collection (abundant resource). Our experiments show significant improvements in retrieval effectiveness over traditional statistical as well as purely locally supervised retrieval models.Comment: Published in AMIA Annual Symposium 201

    Biomedical Question Answering via Weighted Neural Network Passage Retrieval

    Full text link
    The amount of publicly available biomedical literature has been growing rapidly in recent years, yet question answering systems still struggle to exploit the full potential of this source of data. In a preliminary processing step, many question answering systems rely on retrieval models for identifying relevant documents and passages. This paper proposes a weighted cosine distance retrieval scheme based on neural network word embeddings. Our experiments are based on publicly available data and tasks from the BioASQ biomedical question answering challenge and demonstrate significant performance gains over a wide range of state-of-the-art models.Comment: To appear in ECIR 201

    A Cross-Platform Collection of Social Network Profiles

    Full text link
    The proliferation of Internet-enabled devices and services has led to a shifting balance between digital and analogue aspects of our everyday lives. In the face of this development there is a growing demand for the study of privacy hazards, the potential for unique user de-anonymization and information leakage between the various social media profiles many of us maintain. To enable the structured study of such adversarial effects, this paper presents a dedicated dataset of cross-platform social network personas (i.e., the same person has accounts on multiple platforms). The corpus comprises 850 users who generate predominantly English content. Each user object contains the online footprint of the same person in three distinct social networks: Twitter, Instagram and Foursquare. In total, it encompasses over 2.5M tweets, 340k check-ins and 42k Instagram posts. We describe the collection methodology, characteristics of the dataset, and how to obtain it. Finally, we discuss a common use case, cross-platform user identification.Comment: 4 pages, 5 figures, SIGIR 2016, short paper. SIGIR 2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieva

    Privacy Leakage through Innocent Content Sharing in Online Social Networks

    Full text link
    The increased popularity and ubiquitous availability of online social networks and globalised Internet access have affected the way in which people share content. The information that users willingly disclose on these platforms can be used for various purposes, from building consumer models for advertising, to inferring personal, potentially invasive, information. In this work, we use Twitter, Instagram and Foursquare data to convey the idea that the content shared by users, especially when aggregated across platforms, can potentially disclose more information than was originally intended. We perform two case studies: First, we perform user de-anonymization by mimicking the scenario of finding the identity of a user making anonymous posts within a group of users. Empirical evaluation on a sample of real-world social network profiles suggests that cross-platform aggregation introduces significant performance gains in user identification. In the second task, we show that it is possible to infer physical location visits of a user on the basis of shared Twitter and Instagram content. We present an informativeness scoring function which estimates the relevance and novelty of a shared piece of information with respect to an inference task. This measure is validated using an active learning framework which chooses the most informative content at each given point in time. Based on a large-scale data sample, we show that by doing this, we can attain an improved inference performance. In some cases this performance exceeds even the use of the user's full timeline.Comment: 8 pages, 10 figures, submitted to Privacy Preserving Workshop, Sigi

    Semantic Place Descriptors for Classification and Map Discovery

    Full text link
    Urban environments develop complex, non-obvious structures that are often hard to represent in the form of maps or guides. Finding the right place to go often requires intimate familiarity with the location in question and cannot easily be deduced by visitors. In this work, we exploit large-scale samples of usage information, in the form of mobile phone traces and geo-tagged Twitter messages in order to automatically explore and annotate city maps via kernel density estimation. Our experiments are based on one year's worth of mobile phone activity collected by Nokia's Mobile Data Challenge (MDC). We show that usage information can be a strong predictor of semantic place categories, allowing us to automatically annotate maps based on the behavior of the local user base.Comment: 13 pages, 1 figure, 1 tabl

    Axial quasinormal modes of static neutron stars in the nonminimal derivative coupling sector of Horndeski gravity: spectrum and universal relations for realistic equations of state

    Full text link
    We study axial quasinormal modes of static neutron stars in the nonminimal derivative coupling sector of Horndeski theory. We focus on the fundamental curvature mode, which we analyse for ten different equations of state with different matter content. A comparison with the results obtained in pure General Relativity reveals that, apart from modifying the spectrum of the frequencies and the damping times of the stars, this theory modifies several universal relations between the modes and physical parameters of the stars, that are otherwise matter-independent.Comment: 10 pages, 9 figures. References added, typos corrected. Matches published versio

    Indonesian Chinese in the Netherlands and the Legacies of Violence in Colonial and Post-colonial Indonesia

    Full text link
    After Indonesian independence in 1945, thousands of Indonesian Chinese repatriated to the Netherlands, the former colonizer. As opposed to other repatriates from Indonesia, who organized themselves into pressure groups and fought for a place in the national memory culture, the Indonesian Chinese in the Netherlands only formed strict socio-cultural associations and have generally stayed clear of identity politics. Usually, this divergence is attributed to the smooth integration and socio-economic success of the latter group, as well as to Chinese values, such as conflict avoidance.This article adds to this explanation by positing that this phenomenon has also been induced by the legacy of anti-Chinese violence in colonial and post-colonial Indonesia: respectively, Dutch discomfort to acknowledge the violent and discriminatory elements of its own colonial history, as well as a fear of offending the Indonesian government. Consequently, many Indonesian Chinese in the Netherlands have engaged in some form of public self-silencing

    Web2Text: Deep Structured Boilerplate Removal

    Full text link
    Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is essential for the performance of derived applications. To address this issue, we introduce a novel model that performs sequence labeling to collectively classify all text blocks in an HTML page as either boilerplate or main content. Our method uses a hidden Markov model on top of potentials derived from DOM tree features using convolutional neural networks. The proposed method sets a new state-of-the-art performance for boilerplate removal on the CleanEval benchmark. As a component of information retrieval pipelines, it improves retrieval performance on the ClueWeb12 collection.Comment: To appear in ECIR 201
    • …
    corecore