6 research outputs found

    Estimating basis functions in massive fields under the spatial random effects model

    Get PDF
    Spatial prediction is commonly achieved under the assumption of a Gaussian random field (GRF) by obtaining maximum likelihood estimates of parameters, and then using the kriging equations to arrive at predicted values. For massive datasets, fixed rank kriging using the Expectation-Maximization (EM) algorithm for estimation has been proposed as an alternative to the usual but computationally prohibitive kriging method. The method reduces computation cost of estimation by redefining the spatial process as a linear combination of basis functions and spatial random effects. A disadvantage of this method is that it imposes constraints on the relationship between the observed locations and the knots. We develop an alternative method that utilizes the Spatial Mixed Effects (SME) model, but allows for additional flexibility by estimating the range of the spatial dependence between the observations and the knots via an Alternating Expectation Conditional Maximization (AECM) algorithm. Experiments show that our methodology improves estimation without sacrificing prediction accuracy while also minimizing the additional computational burden of extra parameter estimation. The methodology is applied to a temperature data set archived by the United States National Climate Data Center, with improved results over previous methodology

    Solution Path Clustering with Robust Loss and Concave Penalty

    Get PDF
    The main purpose of this dissertation is to demonstrate that using a robust loss function (instead of the usual least squares loss) improves the clustering quality in the solution path clustering scheme. Cluster analysis simultaneously attempts to determine the number of clusters and estimate cluster location and membership. Convex clustering, distinguishing itself from other popular clustering methods, casts the clustering objective as a convex optimization problem and thus admits a global solution. It is a useful exploratory technique which outputs a solution path, evoking the name, ``solution path clustering." The solution path is a tree-like structure with cluster results ranging from n clusters down to a single cluster. Now, the benefits of convex clustering come at a cost since the use of a convex penalty can seriously bias the results and ruin the search for good cluster results. To lessen the bias, Ma and Huang (2017) proposed concave penalties to form the cluster centers. While the clustering objective is no longer convex, the quality of the solutions is improved. We extend the solution path clustering scheme by implementing robust loss functions instead of the usual least squares loss. Following Ma and Huang (2017), we also use a concave penalty to form clusters. The robust loss and concave penalty work together to mitigate the influence of outliers and minimize bias in the estimation of cluster locations, especially when the true distance between clusters is large. We introduce the IRLS-ADMM algorithm to minimize our proposed objective function and prove its convergence to a local minimum. Any loss function that admits an IRLS formulation or a majorizing surrogate can be used. We also study asymptotic and oracle properties of the estimator. Finally, we demonstrate the performance of our proposed method through simulation experiments and on real data sets, as well as provide some preliminary results on choosing the number of clusters via the modified BIC (Wang, Li, and Leng, 2009).­

    New methodological contributions in time series clustering

    Get PDF
    Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Abstract] This thesis presents new procedures to address the analysis cluster of time series. First of all a two-stage procedure based on comparing frequencies and magnitudes of the absolute maxima of the spectral densities is proposed. Assuming that the clustering purpose is to group series according to the underlying dependence structures, a detailed study of the behavior in clustering of a dissimilarity based on comparing estimated quantile autocovariance functions (QAF) is also carried out. A prediction-based resampling algorithm proposed by Dudoit and Fridlyand is adjusted to select the optimal number of clusters. The asymptotic behavior of the sample quantile autocovariances is studied and an algorithm to determine optimal combinations of lags and pairs of quantile levels to perform clustering is introduced. The proposed metric is used to perform hard and soft partitioning-based clustering. First, a broad simulation study examines the behavior of the proposed metric in crisp clustering using hierarchkal and PAM procedure. Then, a novel fuzzy C-mcdoids algorithm based on the QAF-dissimilarity is proposed. Three different robust versions of this fuzzy algorithm are also presented to deal with data containing outlier time series. Finally, other ways of soft clustering analysis are explored, namely probabilistic 0-clustering and clustering based on mixture models.[Resumo] Esta tese presenta novos procedementos para abordar a análise cluster de series temporais. En primeiro lugar proponse un procedemento en dúas etapas baseádo na comparación de frecuencias e magnitudes dos máximos absolutos das densidades espectrais. Supoñendo que o propósito é agrupar series dacordo coas estruturas de dependencia subxaccntes, tamén se leva a cabo un estudo detallado do comportamento en clustering dunha disimilaridade basea.da na comparación das funcións estimadas das autocovarianzas cuantil (QAF). Un algoritmo de remostraxe baseado na predición proposto por Dudoit e Fridlyand adáptase para selecionar o número óptimo de clusters. Tamén se estuda o comportamento asintótico das autocovarianzas cuantís e se introduce un algoritmo para determinar as combinacións óptimas de lags e pares de niveles de cuantís para levar a cabo a clasificación. A métrica proposta utilízase para realizar análise cluster baseado en particións "hard" e "soft". En primeiro lugar, un amplo estudo de simulación examina o comportamento da métrica proposta en clústering "hard" utilizando os procedementos xerárquico e PAM. A continuación, proponse un novo algoritmo "fuzzy" C-medoides baseado na disimilaridade QAF. Tamén se presentan tres versións robustas deste algoritmo "fuzzy" para tratar con datos que conteñan valores atípicos. Finalmente, explóranse outras vías de análise cluster "soft", concretamente, D-clustering probabilístico e clustering baseado en modelos mixtos.[Resumen] Esta tesis presenta nuevos procedimientos para abordar el análisis cluster de series temporales. En primer lugar se propone un procedimiento en dos etapas basado en la comparación de frecuencias y magnitudes de los máximos absolutos de las densidades espectrales. Suponiendo que el propósito es agrupar series de acuerdo con las estructuras de dependencia subyacentes, también se lleva. a cabo un estudio detallado del comportamiento en clustering de una disimilaridad basada en la comparación de las funciones estimadas de las autoco,'afiancias cuantil (QAF). Un algoritmo de remuestreo basado en predicción propuesto por Dudoit y Fridlyand se adapta para seleccionar el número óptimo de clusters. También se estudia el comportamiento asintótico de las autocovariancias cuantites y se introduce un algoritmo para determinar las combinaciones óptimas de lags y pares de niveles de cuantiles para llevar a cabo la clasificación. La. métrica propuesta se utiliza para realizar análisis cluster basado en particiones "hard" y ''soft". En primer lugar, un amplio elltudio de simulación examina el comportamiento de la métrica propuesta en clúster "hard" utilizando los procedimientos jerárquico y PAM. A continuación, se propone un nuevo algoritmo "fuzzy" Cmedoides basado en la disimilaridad QAF. También se presentan tres versiones robustas de este algoritmo "fuzzy" para tratar con datos que contengan atípicos. Finalmente, se exploran otras vías de análisis clus ter "soft", concretamente, D-clustering probabilístico y clustering basado en modelos mixtos
    corecore