1,699 research outputs found
Integrating E-Commerce and Data Mining: Architecture and Challenges
We show that the e-commerce domain can provide all the right ingredients for
successful data mining and claim that it is a killer domain for data mining. We
describe an integrated architecture, based on our expe-rience at Blue Martini
Software, for supporting this integration. The architecture can dramatically
reduce the pre-processing, cleaning, and data understanding effort often
documented to take 80% of the time in knowledge discovery projects. We
emphasize the need for data collection at the application server layer (not the
web server) in order to support logging of data and metadata that is essential
to the discovery process. We describe the data transformation bridges required
from the transaction processing systems and customer event streams (e.g.,
clickstreams) to the data warehouse. We detail the mining workbench, which
needs to provide multiple views of the data through reporting, data mining
algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
Analysis & Visualization of EHR Patient Portal Clickstream Data
The purpose of this paper is the analysis of EHR clickstream data of patient portal to determine patient usage behavior. We present our analysis of patterns found in patient clickstream data. Using directed and undirected data mining approach, data can be explored to examine whether different patient groups appear to use the portal differently. We examine changes in usage over time, and also explore difference in usage, average number of clicks per session and time spent per page based on age and gender. We then use clustering to create groups that discriminate patients by their portal usage behavior. Knowledge of these usage patterns can help service providers understand the demographics and behavioral aspects of their patients, which in turn can help them develop, enhance and improve their systems to make the best use of these portals
The Metabolism and Growth of Web Forums
We view web forums as virtual living organisms feeding on user's attention
and investigate how these organisms grow at the expense of collective
attention. We find that the "body mass" () and "energy consumption" ()
of the studied forums exhibits the allometric growth property, i.e., . This implies that within a forum, the network transporting
attention flow between threads has a structure invariant of time, despite of
the continuously changing of the nodes (threads) and edges (clickstreams). The
observed time-invariant topology allows us to explain the dynamics of networks
by the behavior of threads. In particular, we describe the clickstream
dissipation on threads using the function , in which
is the clickstreams to node and is the clickstream dissipated
from . It turns out that , an indicator for dissipation efficiency,
is negatively correlated with and sets the lower boundary
for . Our findings have practical consequences. For example,
can be used as a measure of the "stickiness" of forums, because it quantifies
the stable ability of forums to convert into , i.e., to remain users
"lock-in" the forum. Meanwhile, the correlation between and
provides a convenient method to evaluate the `stickiness" of forums. Finally,
we discuss an optimized "body mass" of forums at around that minimizes
and maximizes .Comment: 6 figure
Customer purchase behavior prediction in E-commerce: a conceptual framework and research agenda
Digital retailers are experiencing an increasing number of transactions coming from their consumers online, a consequence of the convenience in buying goods via E-commerce platforms. Such interactions compose complex behavioral patterns which can be analyzed through predictive analytics to enable businesses to understand consumer needs. In this abundance of big data and possible tools to analyze them, a systematic review of the literature is missing. Therefore, this paper presents a systematic literature review of recent research dealing with customer purchase prediction in the E-commerce context. The main contributions are a novel analytical framework and a research agenda in the field. The framework reveals three main tasks in this review, namely, the prediction of customer intents, buying sessions, and purchase decisions. Those are followed by their employed predictive methodologies and are analyzed from three perspectives. Finally, the research agenda provides major existing issues for further research in the field of purchase behavior prediction online
Binary Particle Swarm Optimization based Biclustering of Web usage Data
Web mining is the nontrivial process to discover valid, novel, potentially
useful knowledge from web data using the data mining techniques or methods. It
may give information that is useful for improving the services offered by web
portals and information access and retrieval tools. With the rapid development
of biclustering, more researchers have applied the biclustering technique to
different fields in recent years. When biclustering approach is applied to the
web usage data it automatically captures the hidden browsing patterns from it
in the form of biclusters. In this work, swarm intelligent technique is
combined with biclustering approach to propose an algorithm called Binary
Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The
main objective of this algorithm is to retrieve the global optimal bicluster
from the web usage data. These biclusters contain relationships between web
users and web pages which are useful for the E-Commerce applications like web
advertising and marketing. Experiments are conducted on real dataset to prove
the efficiency of the proposed algorithms
Dropout Model Evaluation in MOOCs
The field of learning analytics needs to adopt a more rigorous approach for
predictive model evaluation that matches the complex practice of
model-building. In this work, we present a procedure to statistically test
hypotheses about model performance which goes beyond the state-of-the-practice
in the community to analyze both algorithms and feature extraction methods from
raw data. We apply this method to a series of algorithms and feature sets
derived from a large sample of Massive Open Online Courses (MOOCs). While a
complete comparison of all potential modeling approaches is beyond the scope of
this paper, we show that this approach reveals a large gap in dropout
prediction performance between forum-, assignment-, and clickstream-based
feature extraction methods, where the latter is significantly better than the
former two, which are in turn indistinguishable from one another. This work has
methodological implications for evaluating predictive or AI-based models of
student success, and practical implications for the design and targeting of
at-risk student models and interventions
- …