19,315 research outputs found

    CASP-DM: Context Aware Standard Process for Data Mining

    Get PDF
    We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

    Process Framework for Subscriber Management and Retention in Nigerian Telecommunication Industry

    Get PDF
    in the global telecommunication industry. Hence, a dominant approach for subscriber management and retention is churn control, since it is cheaper to retain an existing subscriber than acquiring a new one. Predictive modeling employs the use of data mining techniques to identify patterns and provide a result that a group of subscribers are likely to churn in the near future. However, the effectiveness of subscriber retention strategy in an organization can be further boosted if the reason for churn and the timing of churn can also be predicted. In this paper, we propose a data mining process framework that can be used to predict churn, determine when a subscriber is likely to churn, provides the reason why a subscriber may churn, and recommend appropriate intervention strategy for customer retention using a combination of statistical and machine learning techniques. This experiment is carried out using data from a major telecom operator in Nigeria

    Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection

    Full text link
    In recent years, there have been many practical applications of anomaly detection such as in predictive maintenance, detection of credit fraud, network intrusion, and system failure. The goal of anomaly detection is to identify in the test data anomalous behaviors that are either rare or unseen in the training data. This is a common goal in predictive maintenance, which aims to forecast the imminent faults of an appliance given abundant samples of normal behaviors. Local outlier factor (LOF) is one of the state-of-the-art models used for anomaly detection, but the predictive performance of LOF depends greatly on the selection of hyperparameters. In this paper, we propose a novel, heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model that uses the proposed method shows good predictive performance in both simulations and real data sets.Comment: 15 pages, 5 figure

    Using webcrawling of publicly available websites to assess E-commerce relationships

    Get PDF
    We investigate e-commerce success factors concerning their impact on the success of commerce transactions between businesses companies. In scientific literature, many e-commerce success factors are introduced. Most of them are focused on companies' website quality. They are evaluated concerning companies' success in the business-to- consumer (B2C) environment where consumers choose their preferred e-commerce websites based on these success factors e.g. website content quality, website interaction, and website customization. In contrast to previous work, this research focuses on the usage of existing e-commerce success factors for predicting successfulness of business-to-business (B2B) ecommerce. The introduced methodology is based on the identification of semantic textual patterns representing success factors from the websites of B2B companies. The successfulness of the identified success factors in B2B ecommerce is evaluated by regression modeling. As a result, it is shown that some B2C e-commerce success factors also enable the predicting of B2B e-commerce success while others do not. This contributes to the existing literature concerning ecommerce success factors. Further, these findings are valuable for B2B e-commerce websites creation

    Re-mining item associations: methodology and a case study in apparel retailing

    Get PDF
    Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques

    Yields and qualities of pigeonpea varieties grown under smallholder farmers’ conditions in Eastern and Southern Africa

    Get PDF
    Pigeonpea is one of the few crops with a high potential for resource-poor farmers due to its complementary resource use when intercropped with maize. A three year comprehensive comparative study on the performance of six pigeonpea (Cajanus cajan) varieties on farmers’ fields in Eastern and Southern Africa where intercropping with maize is normal practice, was undertaken. The varieties were tested for accumulation of dry matter (DM), nitrogen (N) and phosphorus (P) in all above-ground organs for three years under farmers’ conditions. The study revealed that the latest introduced ICEAP 00040 outperformed all the other tested varieties (ICP 9145; ICEAP 00020, ICEAP 00053, ICEAP 00068, and a local variety called “Babati White”) under farmer-managed conditions. The harvest indices (HI), ranging from 0.08-0.15 on dry matter (DM) basis, were relatively low and unaffected (P>0.05) by the environmental variation. The N harvest index (NHI) was 0.28 and P harvest index (PHI) was 0.19. The better responses of ICEAP00040 to favourable conditions could however only be realised in a minority of cases as yields generally were low. These low yields are still a major challenge in African smallholder agriculture as pulses play an important role in soil fertility maintenance as well as in the household diets

    Detecting variable responses in time-series using repeated measures ANOVA: Application to physiologic challenges.

    Get PDF
    We present an approach to analyzing physiologic timetrends recorded during a stimulus by comparing means at each time point using repeated measures analysis of variance (RMANOVA). The approach allows temporal patterns to be examined without an a priori model of expected timing or pattern of response. The approach was originally applied to signals recorded from functional magnetic resonance imaging (fMRI) volumes-of-interest (VOI) during a physiologic challenge, but we have used the same technique to analyze continuous recordings of other physiological signals such as heart rate, breathing rate, and pulse oximetry. For fMRI, the method serves as a complement to whole-brain voxel-based analyses, and is useful for detecting complex responses within pre-determined brain regions, or as a post-hoc analysis of regions of interest identified by whole-brain assessments. We illustrate an implementation of the technique in the statistical software packages R and SAS. VOI timetrends are extracted from conventionally preprocessed fMRI images. A timetrend of average signal intensity across the VOI during the scanning period is calculated for each subject. The values are scaled relative to baseline periods, and time points are binned. In SAS, the procedure PROC MIXED implements the RMANOVA in a single step. In R, we present one option for implementing RMANOVA with the mixed model function "lme". Model diagnostics, and predicted means and differences are best performed with additional libraries and commands in R; we present one example. The ensuing results allow determination of significant overall effects, and time-point specific within- and between-group responses relative to baseline. We illustrate the technique using fMRI data from two groups of subjects who underwent a respiratory challenge. RMANOVA allows insight into the timing of responses and response differences between groups, and so is suited to physiologic testing paradigms eliciting complex response patterns

    On-Disk Data Processing: Issues and Future Directions

    Get PDF
    In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP schemes vary widely in terms of the data processing capability, target applications, architecture and the kind of storage drive employed. Some ODDP schemes provide only a specific but heavily used operation like sort whereas some provide a full range of operations. Recently, with the advent of Solid State Drives, powerful and extensive ODDP solutions have been proposed. In this paper, we present a thorough review of architectures developed for different on-disk processing approaches along with current and future challenges and also identify the future directions which ODDP can take.Comment: 24 pages, 17 Figures, 3 Table

    Next challenges for adaptive learning systems

    Get PDF
    Learning from evolving streaming data has become a 'hot' research topic in the last decade and many adaptive learning algorithms have been developed. This research was stimulated by rapidly growing amounts of industrial, transactional, sensor and other business data that arrives in real time and needs to be mined in real time. Under such circumstances, constant manual adjustment of models is in-efficient and with increasing amounts of data is becoming infeasible. Nevertheless, adaptive learning models are still rarely employed in business applications in practice. In the light of rapidly growing structurally rich 'big data', new generation of parallel computing solutions and cloud computing services as well as recent advances in portable computing devices, this article aims to identify the current key research directions to be taken to bring the adaptive learning closer to application needs. We identify six forthcoming challenges in designing and building adaptive learning (pre-diction) systems: making adaptive systems scalable, dealing with realistic data, improving usability and trust, integrat-ing expert knowledge, taking into account various application needs, and moving from adaptive algorithms towards adaptive tools. Those challenges are critical for the evolving stream settings, as the process of model building needs to be fully automated and continuous.</jats:p
    corecore