2,557 research outputs found
On the Additivity and Weak Baselines for Search Result Diversification Research
A recent study on the topic of additivity addresses the task of search result diversification and concludes that while weaker baselines are almost always significantly improved by the evaluated diversification methods, for stronger baselines, just the opposite happens, i.e., no significant improvement can be observed. Due to the importance of the issue in shaping future research directions and evaluation strategies in search results diversification, in this work, we first aim to reproduce the findings reported in the previous study, and then investigate its possible limitations. Our extensive experiments first reveal that under the same experimental setting with that previous study, we can reach similar results. Next, we hypothesize that for stronger baselines, tuning the parameters of some methods (i.e., the trade-off parameter between the relevance and diversity of the results in this particular scenario) should be done in a more fine-grained manner. With trade-off parameters that are specifically determined for each baseline run, we show that the percentage of significant improvements even over the strong baselines can be doubled. As a further issue, we discuss the possible impact of using the same strong baseline retrieval function for the diversity computations of the methods. Our takeaway message is that in the case of a strong baseline, it is more crucial to tune the parameters of the diversification methods to be evaluated; but once this is done, additivity is achievable
Clustering of twitter technology tweets and the impact of stopwords on clusters
Year of 2010 could be termed as the year in which Twitter became completely mainstream. Twitter, which started as a means of communicating with friends, became much more than its beginning. Now Twitter is used by companies to promote their new products, used by movie industry to promote movies. A lot of advertising and branding is now tied to Twitter and most importantly any breaking news that happens, the first place one goes and tries to find is to search it on Twitter. Be it the Mumbai attacks that happened in 2008, or the minor earthquakes that happened in Bay Area in 2010 or the twitter revolution cause of the Iran elections, most of the tech and not so tech savvy viewers were following twitter rather than any main stream news channels. In fact most of the breaking news now comes on Twitter because of the huge number of user base rather than the traditional mainstream media. The focus of this paper is clustering with the TF-IDF weighted mechanism of daily technology news tweets of prominent bloggers and news sites using Apache Mahout and to evaluate the effects of introducing and removing stop words on the quality of clustering. This project restricts itself to only tweets in the English language
Can Google Trends search queries contribute to risk diversification?
Portfolio diversification and active risk management are essential parts of
financial analysis which became even more crucial (and questioned) during and
after the years of the Global Financial Crisis. We propose a novel approach to
portfolio diversification using the information of searched items on Google
Trends. The diversification is based on an idea that popularity of a stock
measured by search queries is correlated with the stock riskiness. We penalize
the popular stocks by assigning them lower portfolio weights and we bring
forward the less popular, or peripheral, stocks to decrease the total riskiness
of the portfolio. Our results indicate that such strategy dominates both the
benchmark index and the uniformly weighted portfolio both in-sample and
out-of-sample.Comment: 11 pages, 3 figure
Recommended from our members
Determining citizensâ opinions about stories in the news media: analysing Google, Facebook and Twitter
We describe a method whereby a governmental policy maker can discover citizensâ reaction to news stories. This is particularly relevant in the political world, where governmentsâ policy statements are reported by the news media and discussed by citizens. The work here addresses two main questions: whereabouts are citizens discussing a news story, and what are they saying? Our strategy to answer the first question is to find news articles pertaining to the policy statements, then perform internet searches for references to the news articlesâ headlines and URLs. We have created a software tool that schedules repeating Google searches for the news articles and collects the results in a database, enabling the user to aggregate and analyse them to produce ranked tables of sites that reference the news articles. Using data mining techniques we can analyse data so that resultant ranking reflects an overall aggregate score, taking into account multiple datasets, and this shows the most relevant places on the internet where the story is discussed. To answer the second question, we introduce the WeGov toolbox as a tool for analysing citizensâ comments and behaviour pertaining to news stories. We first use the tool for identifying social network discussions, using different strategies for Facebook and Twitter. We apply different analysis components to analyse the data to distil the essence of the social network usersâ comments, to determine influential users and identify important comments
Temporal models for mining, ranking and recommendation in the Web
Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, heterogeneous temporal datasets i.e., the Web, collaborative knowledge bases and social networks have been emerged as gold-mines for content analytics of many sorts. In those collections, time plays an essential role in many crucial information retrieval and data mining tasks, such as from user intent understanding, document ranking to advanced recommendations. There are two semantically closed
and important constituents when modeling along the time dimension, i.e., entity and event. Time is crucially served as the context for changes driven by happenings and phenomena (events) that related to people, organizations or places (so-called entities) in our social lives. Thus, determining what users expect, or in other words, resolving the uncertainty confounded by temporal changes is a compelling task to support consistent user satisfaction.
In this thesis, we address the aforementioned issues and propose temporal models that capture the temporal dynamics of such entities and events to serve for the end tasks. Specifically, we make the following contributions in this thesis:
(1) Query recommendation and document ranking in the Web - we address the issues for suggesting entity-centric queries and ranking effectiveness surrounding the happening time period of an associated event. In particular, we propose a multi-criteria optimization framework that facilitates the combination of multiple temporal models to smooth out the abrupt changes when transitioning between event phases for the former and a probabilistic approach for search result diversification of temporally ambiguous queries for the latter.
(2) Entity relatedness in Wikipedia - we study the long-term dynamics of Wikipedia as a global memory place for high-impact events, specifically the reviving memories of past events. Additionally, we propose a neural network-based approach to measure the temporal relatedness of entities and events. The model engages different latent representations of an entity (i.e., from time, link-based graph and content) and use the collective attention from user navigation as the supervision.
(3) Graph-based ranking and temporal anchor-text mining inWeb Archives - we tackle the problem of discovering important documents along the time-span ofWeb Archives, leveraging the link graph. Specifically, we combine the problems of relevance, temporal authority, diversity and time in a unified framework. The model accounts for the incomplete link structure and natural time lagging in Web Archives in mining the temporal authority.
(4) Methods for enhancing predictive models at early-stage in social media and clinical domain - we investigate several methods to control model instability and enrich contexts of predictive models at the âcold-startâ period. We demonstrate their effectiveness for the rumor detection and blood glucose prediction cases respectively.
Overall, the findings presented in this thesis demonstrate the importance of tracking these temporal dynamics surround salient events and entities for IR applications. We show that determining such changes in time-based patterns and trends in prevalent temporal collections can better satisfy user expectations, and boost ranking and recommendation effectiveness over time
Pinterest Board Recommendation for Twitter Users
Pinboard on Pinterest is an emerging media to engage online social media
users, on which users post online images for specific topics. Regardless of its
significance, there is little previous work specifically to facilitate
information discovery based on pinboards. This paper proposes a novel pinboard
recommendation system for Twitter users. In order to associate contents from
the two social media platforms, we propose to use MultiLabel classification to
map Twitter user followees to pinboard topics and visual diversification to
recommend pinboards given user interested topics. A preliminary experiment on a
dataset with 2000 users validated our proposed system
Research Paper Recommender System with Serendipity Using Tweets vs. Diversification
21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019, Kuala Lumpur, Malaysia, November 4â7, 2019. Part of the Lecture Notes in Computer Science book series (LNCS, volume 11853), also part of the Information Systems and Applications, incl. Internet/Web, and HCI book sub series (LNISA, volume 11853).So far, a lot of works have studied research paper recommender systems. However, most of them have focused only on the accuracy and ignored the serendipity, which is an important aspect for user satisfaction. The serendipity is concerned with the novelty of recommendations and to which extent recommendations positively surprise users. In this paper, we investigate a research paper recommender system focusing on serendipity. In particular, we examine (1) whether a userâs tweets lead to a generation of serendipitous recommendations and (2) whether the use of diversification on a recommendation list improves serendipity. We have conducted an online experiment with 22 subjects in the domain of computer science. The result of our experiment shows that tweets do not improve the serendipity, despite their heterogeneous nature. However, diversification delivers serendipitous research papers that cannot be generated by a traditional strategy
- âŠ