1,699 research outputs found

    Integrating E-Commerce and Data Mining: Architecture and Challenges

    Full text link
    We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our expe-rience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We emphasize the need for data collection at the application server layer (not the web server) in order to support logging of data and metadata that is essential to the discovery process. We describe the data transformation bridges required from the transaction processing systems and customer event streams (e.g., clickstreams) to the data warehouse. We detail the mining workbench, which needs to provide multiple views of the data through reporting, data mining algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200

    Analysis & Visualization of EHR Patient Portal Clickstream Data

    Get PDF
    The purpose of this paper is the analysis of EHR clickstream data of patient portal to determine patient usage behavior. We present our analysis of patterns found in patient clickstream data. Using directed and undirected data mining approach, data can be explored to examine whether different patient groups appear to use the portal differently. We examine changes in usage over time, and also explore difference in usage, average number of clicks per session and time spent per page based on age and gender. We then use clustering to create groups that discriminate patients by their portal usage behavior. Knowledge of these usage patterns can help service providers understand the demographics and behavioral aspects of their patients, which in turn can help them develop, enhance and improve their systems to make the best use of these portals

    The Metabolism and Growth of Web Forums

    Full text link
    We view web forums as virtual living organisms feeding on user's attention and investigate how these organisms grow at the expense of collective attention. We find that the "body mass" (PVPV) and "energy consumption" (UVUV) of the studied forums exhibits the allometric growth property, i.e., PVtUVtθPV_t \sim UV_t ^ \theta. This implies that within a forum, the network transporting attention flow between threads has a structure invariant of time, despite of the continuously changing of the nodes (threads) and edges (clickstreams). The observed time-invariant topology allows us to explain the dynamics of networks by the behavior of threads. In particular, we describe the clickstream dissipation on threads using the function DiTiγD_i \sim T_i ^ \gamma, in which TiT_i is the clickstreams to node ii and DiD_i is the clickstream dissipated from ii. It turns out that γ\gamma, an indicator for dissipation efficiency, is negatively correlated with θ\theta and 1/γ1/\gamma sets the lower boundary for θ\theta. Our findings have practical consequences. For example, θ\theta can be used as a measure of the "stickiness" of forums, because it quantifies the stable ability of forums to convert UVUV into PVPV, i.e., to remain users "lock-in" the forum. Meanwhile, the correlation between γ\gamma and θ\theta provides a convenient method to evaluate the `stickiness" of forums. Finally, we discuss an optimized "body mass" of forums at around 10510^5 that minimizes γ\gamma and maximizes θ\theta.Comment: 6 figure

    Customer purchase behavior prediction in E-commerce: a conceptual framework and research agenda

    Get PDF
    Digital retailers are experiencing an increasing number of transactions coming from their consumers online, a consequence of the convenience in buying goods via E-commerce platforms. Such interactions compose complex behavioral patterns which can be analyzed through predictive analytics to enable businesses to understand consumer needs. In this abundance of big data and possible tools to analyze them, a systematic review of the literature is missing. Therefore, this paper presents a systematic literature review of recent research dealing with customer purchase prediction in the E-commerce context. The main contributions are a novel analytical framework and a research agenda in the field. The framework reveals three main tasks in this review, namely, the prediction of customer intents, buying sessions, and purchase decisions. Those are followed by their employed predictive methodologies and are analyzed from three perspectives. Finally, the research agenda provides major existing issues for further research in the field of purchase behavior prediction online

    Binary Particle Swarm Optimization based Biclustering of Web usage Data

    Full text link
    Web mining is the nontrivial process to discover valid, novel, potentially useful knowledge from web data using the data mining techniques or methods. It may give information that is useful for improving the services offered by web portals and information access and retrieval tools. With the rapid development of biclustering, more researchers have applied the biclustering technique to different fields in recent years. When biclustering approach is applied to the web usage data it automatically captures the hidden browsing patterns from it in the form of biclusters. In this work, swarm intelligent technique is combined with biclustering approach to propose an algorithm called Binary Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The main objective of this algorithm is to retrieve the global optimal bicluster from the web usage data. These biclusters contain relationships between web users and web pages which are useful for the E-Commerce applications like web advertising and marketing. Experiments are conducted on real dataset to prove the efficiency of the proposed algorithms

    Dropout Model Evaluation in MOOCs

    Full text link
    The field of learning analytics needs to adopt a more rigorous approach for predictive model evaluation that matches the complex practice of model-building. In this work, we present a procedure to statistically test hypotheses about model performance which goes beyond the state-of-the-practice in the community to analyze both algorithms and feature extraction methods from raw data. We apply this method to a series of algorithms and feature sets derived from a large sample of Massive Open Online Courses (MOOCs). While a complete comparison of all potential modeling approaches is beyond the scope of this paper, we show that this approach reveals a large gap in dropout prediction performance between forum-, assignment-, and clickstream-based feature extraction methods, where the latter is significantly better than the former two, which are in turn indistinguishable from one another. This work has methodological implications for evaluating predictive or AI-based models of student success, and practical implications for the design and targeting of at-risk student models and interventions
    corecore