101 research outputs found

    Semiparametric estimation of a two-component mixture of linear regressions in which one component is known

    Full text link
    A new estimation method for the two-component mixture model introduced in \cite{Van13} is proposed. This model consists of a two-component mixture of linear regressions in which one component is entirely known while the proportion, the slope, the intercept and the error distribution of the other component are unknown. In spite of good performance for datasets of reasonable size, the method proposed in \cite{Van13} suffers from a serious drawback when the sample size becomes large as it is based on the optimization of a contrast function whose pointwise computation requires O(n^2) operations. The range of applicability of the method derived in this work is substantially larger as it relies on a method-of-moments estimator free of tuning parameters whose computation requires O(n) operations. From a theoretical perspective, the asymptotic normality of both the estimator of the Euclidean parameter vector and of the semiparametric estimator of the c.d.f.\ of the error is proved under weak conditions not involving zero-symmetry assumptions. In addition, an approximate confidence band for the c.d.f.\ of the error can be computed using a weighted bootstrap whose asymptotic validity is proved. The finite-sample performance of the resulting estimation procedure is studied under various scenarios through Monte Carlo experiments. The proposed method is illustrated on three real datasets of size n=150n=150, 51 and 176,343, respectively. Two extensions of the considered model are discussed in the final section: a model with an additional scale parameter for the first component, and a model with more than one explanatory variable.Comment: 43 pages, 4 figures, 5 table

    A Methodology and Tool for Rapid Prototyping of Data Warehouses using Data Mining: Application to Birds Biodiversity

    No full text
    International audienceData Warehouses (DWs) are large repositories of data aimed at supporting the decision-making process by enabling flexible and interactive analyses via OLAP systems. Rapid prototyping of DWs is necessary when OLAP applications are complex. Some work about the integration of Data Mining and OLAP systems has been done to enhance OLAP operators with mined indicators, and/or to define the DW schema. However, to best of our knowledge, prototyping methods for DWs do not support this kind of integration. Then, in this paper we present a new prototyping methodology for DWs, extending [3], where DM methods are used to define the DW schema. We validate our approach on a real data set concerning bird biodiversity
    corecore