418,502 research outputs found

    Statistical learning methods for mining marketing and biological data

    Get PDF
    Nowadays, the value of data has been broadly recognized and emphasized. More and more decisions are made based on data and analysis rather than solely on experience and intuition. With the fast development of networking, data storage, and data collection capacity, data have increased dramatically in industry, science and engineering domains, which brings both great opportunities and challenges. To take advantage of the data flood, new computational methods are in demand to process, analyze and understand these datasets. This dissertation focuses on the development of statistical learning methods for online advertising and bioinformatics to model real world data with temporal or spatial changes. First, a collaborated online change-point detection method is proposed to identify the change-points in sparse time series. It leverages the signals from the auxiliary time series such as engagement metrics to compensate the sparse revenue data and improve detection efficiency and accuracy through smart collaboration. Second, a task-specific multi-task learning algorithm is developed to model the ever-changing video viewing behaviors. With the 1-regularized task-specific features and jointly estimated shared features, it allows different models to seek common ground while reserving differences. Third, an empirical Bayes method is proposed to identify 3\u27 and 5\u27 alternative splicing in RNA-seq data. It formulates alternative 3\u27 and 5\u27 splicing site selection as a change-point problem and provides for the first time a systematic framework to pool information across genes and integrate various information when available, in particular the useful junction read information, in order to obtain better performance

    Analytical study and computational modeling of statistical methods for data mining

    Get PDF
    Today, there is tremendous increase of the information available on electronic form. Day by day it is increasing massively. There are enough opportunities for research to retrieve knowledge from the data available in this information. Data mining and app

    A Statistical Toolbox For Mining And Modeling Spatial Data

    Get PDF
    Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free) software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP), valuable in exploratory spatial data analysis

    Advanced of Mathematics-Statistics Methods to Radar Calibration for Rainfall Estimation; A Review

    Get PDF
    Ground-based radar is known as one of the most important systems for precipitation measurement at high spatial and temporal resolutions. Radar data are recorded in digital manner and readily ingested to any statistical analyses. These measurements are subjected to specific calibration to eliminate systematic errors as well as minimizing the random errors, respectively. Since statistical methods are based on mathematics, they offer more precise results and easy interpretation with lower data detail. Although they have challenge to interpret due to their mathematical structure, but the accuracy of the conclusions and the interpretation of the output are appropriate. This article reviews the advanced methods in using the calibration of ground-based radar for forecasting meteorological events include two aspects: statistical techniques and data mining. Statistical techniques refer to empirical analyses such as regression, while data mining includes the Artificial Neural Network (ANN), data Kriging, Nearest Neighbour (NN), Decision Tree (DT) and fuzzy logic. The results show that Kriging is more applicable for interpolation. Regression methods are simple to use and data mining based on Artificial Intelligence is very precise. Thus, this review explores the characteristics of the statistical parameters in the field of radar applications and shows which parameters give the best results for undefined cases. DOI: 10.17762/ijritcc2321-8169.15012

    Data envelopment analysis and data mining to efficiency estimation and evaluation

    Get PDF
    Purpose: This paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance. Design/methodology/approach: Different statistical and data mining techniques are used to second-stage DEA for bank performance as a part of an attempt to produce a powerful model for bank performance with effective predictive ability. The projected data mining tools are classification and regression trees (CART), conditional inference trees (CIT), random forest based on CART and CIT, bagging, artificial neural networks and their statistical counterpart, logistic regression. Findings: The results showed that random forests and bagging outperform other methods in terms of predictive power. Originality/value: This is the first study to assess the impact of environmental factors on banking performance in Middle East and North Africa countries.Scopu
    corecore