Search CORE

27,537 research outputs found

Recommended from our members

Information content of spatially distributed ground-based measurements for hydrologic-parameter calibration in mixed rain-snow mountain headwaters

Author: Avanzi F
Bales RC
Conklin MH
Glaser SD
Maurer T
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Parameters in hydrologic models used in mixed rain-snow regions are often uncertain to calibrate and overfitted on streamflow. To contribute addressing these challenges, we used an algorithm that assesses modeling performances through time (Dynamic Identifiability Analysis) to quantify the information content of spatially distributed ground-based measurements for identifying optimal parameter values in the Precipitation Runoff Modeling System (PRMS) model. Including spatially distributed ground-based measurements in Identifiability Analysis allowed us to unambiguously estimate more parameter values than only using streamflow (seven parameters instead of two out of a pool of thirty-three). Peaks in information gain were obtained when using dew-point temperature to identify precipitation phase-partitioning parameters. Multi-attribute identifiability analysis also yielded optimal parameter values that were temporally less variable than those estimated using streamflow alone. Overall, identifying parameter values using ground-based measurements improved the simulation of key drivers of the surface-water budget, such as air temperature and precipitation-phase partitioning. However, parameters simulating surface-to-subsurface mass fluxes like snow accumulation and melt or evapotranspiration were poorly identified by any attribute and so emerged as key sources of predictive uncertainty for this distributed-parameter hydrologic model. This work demonstrates the value of expanded ground-based measurements for identifying parameters in distributed-parameter hydrologic models and so diagnosing their conceptual uncertainty across the water budget

eScholarship - University of California

A multi-resolution approximation for massive spatial datasets

Author: Katzfuss Matthias
Publication venue: 'Informa UK Limited'
Publication date: 07/12/2015
Field of study

Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method.Comment: 23 pages; to be published in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

FigShare

Recommended from our members

Approaches to conceptual clustering

Author: Fisher Douglas
Langley Pat
Publication venue: eScholarship, University of California
Publication date: 12/07/1985
Field of study

Methods for Conceptual Clustering may be explicated in two lights. Conceptual Clustering methods may be viewed as extensions to techniques of numerical taxonomy, a collection of methods developed by social and natural scientists for creating classification schemes over object sets. Alternatively, conceptual clustering may be viewed as a form of learning by observation or concept formation, as opposed to methods of learning from examples or concept identification. In this paper we survey and compare a number of conceptual clustering methods along dimensions suggested by each of these views. The point we most wish to clarify is that conceptual clustering processes can be explicated as being composed of three distinct but inter-dependent subprocesses: the process of deriving a hierarchical classification scheme; the process of aggregating objects into individual classes; and the process of assigning conceptual descriptions to object classes. Each subprocess may be characterized along a number of dimensions related to search, thus facilitating a better understanding of the conceptual clustering process as a whole

eScholarship - University of California

Distributed Correlation-Based Feature Selection in Spark

Author: Alonso-Betanzos Amparo
de-Marcos Luis
Palma-Mendoza Raul-Jose
Rodriguez Daniel
Publication venue: 'Elsevier BV'
Publication date: 31/01/2019
Field of study

CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

Efficient regularized isotonic regression with application to gene--gene interaction search

Author: Luss Ronny
Rosset Saharon
Shahar Moni
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/03/2012
Field of study

Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller "best cut" subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref