44,702 research outputs found
A large multilingual and multi-domain dataset for recommender systems
This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset
from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books,
movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of
users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees
representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles
describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting
available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others
Exploring Topic-based Language Models for Effective Web Information Retrieval
The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model
Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media
Most of the online news media outlets rely heavily on the revenues generated
from the clicks made by their readers, and due to the presence of numerous such
outlets, they need to compete with each other for reader attention. To attract
the readers to click on an article and subsequently visit the media site, the
outlets often come up with catchy headlines accompanying the article links,
which lure the readers to click on the link. Such headlines are known as
Clickbaits. While these baits may trick the readers into clicking, in the long
run, clickbaits usually don't live up to the expectation of the readers, and
leave them disappointed.
In this work, we attempt to automatically detect clickbaits and then build a
browser extension which warns the readers of different media sites about the
possibility of being baited by such headlines. The extension also offers each
reader an option to block clickbaits she doesn't want to see. Then, using such
reader choices, the extension automatically blocks similar clickbaits during
her future visits. We run extensive offline and online experiments across
multiple media sites and find that the proposed clickbait detection and the
personalized blocking approaches perform very well achieving 93% accuracy in
detecting and 89% accuracy in blocking clickbaits.Comment: 2016 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining (ASONAM
Events and Controversies: Influences of a Shocking News Event on Information Seeking
It has been suggested that online search and retrieval contributes to the
intellectual isolation of users within their preexisting ideologies, where
people's prior views are strengthened and alternative viewpoints are
infrequently encountered. This so-called "filter bubble" phenomenon has been
called out as especially detrimental when it comes to dialog among people on
controversial, emotionally charged topics, such as the labeling of genetically
modified food, the right to bear arms, the death penalty, and online privacy.
We seek to identify and study information-seeking behavior and access to
alternative versus reinforcing viewpoints following shocking, emotional, and
large-scale news events. We choose for a case study to analyze search and
browsing on gun control/rights, a strongly polarizing topic for both citizens
and leaders of the United States. We study the period of time preceding and
following a mass shooting to understand how its occurrence, follow-on
discussions, and debate may have been linked to changes in the patterns of
searching and browsing. We employ information-theoretic measures to quantify
the diversity of Web domains of interest to users and understand the browsing
patterns of users. We use these measures to characterize the influence of news
events on these web search and browsing patterns
Reading the Source Code of Social Ties
Though online social network research has exploded during the past years, not
much thought has been given to the exploration of the nature of social links.
Online interactions have been interpreted as indicative of one social process
or another (e.g., status exchange or trust), often with little systematic
justification regarding the relation between observed data and theoretical
concept. Our research aims to breach this gap in computational social science
by proposing an unsupervised, parameter-free method to discover, with high
accuracy, the fundamental domains of interaction occurring in social networks.
By applying this method on two online datasets different by scope and type of
interaction (aNobii and Flickr) we observe the spontaneous emergence of three
domains of interaction representing the exchange of status, knowledge and
social support. By finding significant relations between the domains of
interaction and classic social network analysis issues (e.g., tie strength,
dyadic interaction over time) we show how the network of interactions induced
by the extracted domains can be used as a starting point for more nuanced
analysis of online social data that may one day incorporate the normative
grammar of social interaction. Our methods finds applications in online social
media services ranging from recommendation to visual link summarization.Comment: 10 pages, 8 figures, Proceedings of the 2014 ACM conference on Web
(WebSci'14
Recommended from our members
TREatment of ATopic eczema (TREAT) Registry Taskforce: consensus on how and when to measure the core dataset for atopic eczema treatment research registries.
BackgroundComparative, real-life and long-term evidence on the effectiveness and safety of phototherapy and systemic therapy in moderate-to-severe atopic eczema (AE) is limited. Such data must come from well-designed prospective patient registries. Standardization of data collection is needed for direct comparisons and data pooling.ObjectivesTo reach a consensus on how and when to measure the previously defined domain items of the TREatment of ATopic eczema (TREAT) Registry Taskforce core dataset for research registries for paediatric and adult patients with AE.MethodsProposals for the measurement instruments were based on recommendations of the Harmonising Outcome Measures for Eczema (HOME) initiative, the existing AE database of TREATgermany, systematic reviews of the literature and expert opinions. The proposals were discussed at three face-to-face consensus meetings, one teleconference and via e-mail. The frequency of follow-up visits was determined by an expert survey.ResultsA total of 16 experts from seven countries participated in the 'how to measure' consensus process and 12 external experts were consulted. A consensus was reached for all domain items on how they should be measured by assigning measurement instruments. A minimum follow-up frequency of initially 4 weeks after commencing treatment, then every 3 months while on treatment and every 6 months while off treatment was defined.ConclusionsThis core dataset for national AE research registries will aid in the comparability and pooling of data across centres and country borders, and enables international collaboration to assess the long-term effectiveness and safety of phototherapy and systemic therapy used in patients with AE. What's already known about this topic? Comparable, real-life and long-term data on the effectiveness and safety of phototherapy and systemic therapy in patients with atopic eczema (AE) are needed. There is a high diversity of outcomes and instruments used in AE research, which require harmonization to enhance comparability and allow data pooling. What does this study add? Our taskforce has reached international consensus on how and when to measure core domain items for national AE research registries. This core dataset is now available for use by researchers worldwide and will aid in the collection of unified data. What are the clinical implications of this work? The data collected through this core dataset will help to gain better insights into the long-term effectiveness and safety of phototherapy and systemic therapy in AE and will provide important information for clinical practice. Standardization of such data collection at the national level will also allow direct data comparisons and pooling across country borders (e.g. in the analysis of treatment-related adverse events that require large patient numbers)
Treatment recommendations for psoriatic arthritis
Objective: To develop comprehensive recommendations for the treatment of the various clinical manifestations of psoriatic arthritis (PsA) based on evidence obtained from a systematic review of the literature and from consensus opinion. Methods: Formal literature reviews of treatment for the most significant discrete clinical manifestations of PsA (skin and nails, peripheral arthritis, axial disease, dactylitis and enthesitis) were performed and published by members of the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA). Treatment recommendations were drafted for each of the clinical manifestations by rheumatologists, dermatologists and PsA patients based on the literature reviews and consensus opinion. The level of agreement for the individual treatment recommendations among GRAPPA members was assessed with an online questionnaire. Results: Treatment recommendations were developed for peripheral arthritis, axial disease, psoriasis, nail disease, dactylitis and enthesitis in the setting of PsA. In rotal, 19 recommendations were drafted, and over 80% agreement was obtained on 16 of them. In addition, a grid that factors disease severity into each of the different disease manifestations was developed to help the clinician with treatment decisions for the individual patient from an evidenced-based perspective. Conclusions: Treatment recommendations for the cardinal physical manifestations of PsA were developed based on a literature review and consensus between rheumatologists and dermatologists. In addition, a grid was established to assist in therapeutic reasoning and decision making for individual patients. It is anticipated that periodic updates will take place using this framework as new data become available
What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries
We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users’ needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers’ querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3–4% of the total search traffic. Since questions are such a small part of the query stream and are more likely to be unique than shorter queries, clickthrough information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy. To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer. Our findings reveal what people ask a search engine and also how this contrasts behavior on a CQA platform
- …