8,295 research outputs found
The contribution of data mining to information science
The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research
Automated user modeling for personalized digital libraries
Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to
improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in
an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information
Generating Query Suggestions to Support Task-Based Search
We address the problem of generating query suggestions to support users in
completing their underlying tasks (which motivated them to search in the first
place). Given an initial query, these query suggestions should provide a
coverage of possible subtasks the user might be looking for. We propose a
probabilistic modeling framework that obtains keyphrases from multiple sources
and generates query suggestions from these keyphrases. Using the test suites of
the TREC Tasks track, we evaluate and analyze each component of our model.Comment: Proceedings of the 40th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '17), 201
Negative Statements Considered Useful
Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results
In-degree, PageRank, number of visits and other measures of Web page
popularity significantly influence the ranking of search results by modern
search engines. The assumption is that popularity is closely correlated with
quality, a more elusive concept that is difficult to measure directly.
Unfortunately, the correlation between popularity and quality is very weak for
newly-created pages that have yet to receive many visits and/or in-links.
Worse, since discovery of new content is largely done by querying search
engines, and because users usually focus their attention on the top few
results, newly-created but high-quality pages are effectively ``shut out,'' and
it can take a very long time before they become popular.
We propose a simple and elegant solution to this problem: the introduction of
a controlled amount of randomness into search result ranking methods. Doing so
offers new pages a chance to prove their worth, although clearly using too much
randomness will degrade result quality and annul any benefits achieved. Hence
there is a tradeoff between exploration to estimate the quality of new pages
and exploitation of pages already known to be of high quality. We study this
tradeoff both analytically and via simulation, in the context of an economic
objective function based on aggregate result quality amortized over time. We
show that a modest amount of randomness leads to improved search results
- …