44,502 research outputs found
Integrating E-Commerce and Data Mining: Architecture and Challenges
We show that the e-commerce domain can provide all the right ingredients for
successful data mining and claim that it is a killer domain for data mining. We
describe an integrated architecture, based on our expe-rience at Blue Martini
Software, for supporting this integration. The architecture can dramatically
reduce the pre-processing, cleaning, and data understanding effort often
documented to take 80% of the time in knowledge discovery projects. We
emphasize the need for data collection at the application server layer (not the
web server) in order to support logging of data and metadata that is essential
to the discovery process. We describe the data transformation bridges required
from the transaction processing systems and customer event streams (e.g.,
clickstreams) to the data warehouse. We detail the mining workbench, which
needs to provide multiple views of the data through reporting, data mining
algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
A Hybrid Web Recommendation System based on the Improved Association Rule Mining Algorithm
As the growing interest of web recommendation systems those are applied to
deliver customized data for their users, we started working on this system.
Generally the recommendation systems are divided into two major categories such
as collaborative recommendation system and content based recommendation system.
In case of collaborative recommen-dation systems, these try to seek out users
who share same tastes that of given user as well as recommends the websites
according to the liking given user. Whereas the content based recommendation
systems tries to recommend web sites similar to those web sites the user has
liked. In the recent research we found that the efficient technique based on
asso-ciation rule mining algorithm is proposed in order to solve the problem of
web page recommendation. Major problem of the same is that the web pages are
given equal importance. Here the importance of pages changes according to the
fre-quency of visiting the web page as well as amount of time user spends on
that page. Also recommendation of newly added web pages or the pages those are
not yet visited by users are not included in the recommendation set. To
over-come this problem, we have used the web usage log in the adaptive
association rule based web mining where the asso-ciation rules were applied to
personalization. This algorithm was purely based on the Apriori data mining
algorithm in order to generate the association rules. However this method also
suffers from some unavoidable drawbacks. In this paper we are presenting and
investigating the new approach based on weighted Association Rule Mining
Algorithm and text mining. This is improved algorithm which adds semantic
knowledge to the results, has more efficiency and hence gives better quality
and performances as compared to existing approaches.Comment: 9 pages, 7 figures, 2 table
Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach
A significant amount of search queries originate from some real world
information need or tasks. In order to improve the search experience of the end
users, it is important to have accurate representations of tasks. As a result,
significant amount of research has been devoted to extracting proper
representations of tasks in order to enable search systems to help users
complete their tasks, as well as providing the end user with better query
suggestions, for better recommendations, for satisfaction prediction, and for
improved personalization in terms of tasks. Most existing task extraction
methodologies focus on representing tasks as flat structures. However, tasks
often tend to have multiple subtasks associated with them and a more
naturalistic representation of tasks would be in terms of a hierarchy, where
each task can be composed of multiple (sub)tasks. To this end, we propose an
efficient Bayesian nonparametric model for extracting hierarchies of such tasks
\& subtasks. We evaluate our method based on real world query log data both
through quantitative and crowdsourced experiments and highlight the importance
of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape
Evaluating tag-based information access in image collections
The availability of social tags has greatly enhanced access to information. Tag clouds have emerged as a new "social" way to find and visualize information, providing both one-click access to information and a snapshot of the "aboutness" of a tagged collection. A range of research projects explored and compared different tag artifacts for information access ranging from regular tag clouds to tag hierarchies. At the same time, there is a lack of user studies that compare the effectiveness of different types of tag-based browsing interfaces from the users point of view. This paper contributes to the research on tag-based information access by presenting a controlled user study that compared three types of tag-based interfaces on two recognized types of search tasks - lookup and exploratory search. Our results demonstrate that tag-based browsing interfaces significantly outperform traditional search interfaces in both performance and user satisfaction. At the same time, the differences between the two types of tag-based browsing interfaces explored in our study are not as clear. Copyright 2012 ACM
Contextualised Browsing in a Digital Library's Living Lab
Contextualisation has proven to be effective in tailoring \linebreak search
results towards the users' information need. While this is true for a basic
query search, the usage of contextual session information during exploratory
search especially on the level of browsing has so far been underexposed in
research. In this paper, we present two approaches that contextualise browsing
on the level of structured metadata in a Digital Library (DL), (1) one variant
bases on document similarity and (2) one variant utilises implicit session
information, such as queries and different document metadata encountered during
the session of a users. We evaluate our approaches in a living lab environment
using a DL in the social sciences and compare our contextualisation approaches
against a non-contextualised approach. For a period of more than three months
we analysed 47,444 unique retrieval sessions that contain search activities on
the level of browsing. Our results show that a contextualisation of browsing
significantly outperforms our baseline in terms of the position of the first
clicked item in the result set. The mean rank of the first clicked document
(measured as mean first relevant - MFR) was 4.52 using a non-contextualised
ranking compared to 3.04 when re-ranking the result lists based on similarity
to the previously viewed document. Furthermore, we observed that both
contextual approaches show a noticeably higher click-through rate. A
contextualisation based on document similarity leads to almost twice as many
document views compared to the non-contextualised ranking.Comment: 10 pages, 2 figures, paper accepted at JCDL 201
- …