4,101 research outputs found
Sequences of regressions and their independences
Ordered sequences of univariate or multivariate regressions provide
statistical models for analysing data from randomized, possibly sequential
interventions, from cohort or multi-wave panel studies, but also from
cross-sectional or retrospective studies. Conditional independences are
captured by what we name regression graphs, provided the generated distribution
shares some properties with a joint Gaussian distribution. Regression graphs
extend purely directed, acyclic graphs by two types of undirected graph, one
type for components of joint responses and the other for components of the
context vector variable. We review the special features and the history of
regression graphs, derive criteria to read all implied independences of a
regression graph and prove criteria for Markov equivalence that is to judge
whether two different graphs imply the same set of independence statements.
Knowledge of Markov equivalence provides alternative interpretations of a given
sequence of regressions, is essential for machine learning strategies and
permits to use the simple graphical criteria of regression graphs on graphs for
which the corresponding criteria are in general more complex. Under the known
conditions that a Markov equivalent directed acyclic graph exists for any given
regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited
discussion paper in the journal TES
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Predicting soil organic carbon in a small farm system using in situ spectral measurements and the random forest regression
A research report submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science (Geographical Information Sciences and Remote Sensing)
Johannesburg, 2017Soil organic carbon is considered as the most determining indicator of soil fertility. The purpose of this research was to predict the soil organic carbon in the Mokhotlong region, eastern of Lesotho using in situ spectral measurements and random forest regression. Soil reflectance spectra were acquired by a portable field spectrometer.
The performance of random forest regression was assessed by comparing it with one of the most popular models in spectroscopy, partial least square regression. Laboratory spectroscopy measurements of the soil samples were analysed for assessing the accuracy of in situ spectroscopy based-models. The effect of the Savitzky−Golay first derivative in improving partial least square regression and random forest regression in both spectral data was also assessed.
The results indicated that the random forest regression could accurately predict the soil organic carbon contents on an independent dataset using in situ spectroscopy data (RPD = 3.77, Rp2= 0.88, RMSEP = 0.64%). The overall best predictive model was achieved with the derivative laboratory spectral data using random forest with the optimum number of key wavelengths (RPD = 3.77, Rp2= 0.88, RMSEP = 0.64%). In contrast, partial least square regression was likely to overfit the calibration dataset. Important wavelengths to predict soil organic contents were localised around the visible range (400-700 nm). An implication of this research is that soil organic carbon can accurately be estimated using derivative in situ spectroscopy measurements and random forest regression with key wavelengths.MT 201
Chemometrics for ion mobility spectrometry data:Recent advances and future prospects
Contains fulltext :
161386.pdf (publisher's version ) (Open Access)Historically, advances in the field of ion mobility spectrometry have been hindered by the variation in measured signals between instruments developed by different research laboratories or manufacturers. This has triggered the development and application of chemometric techniques able to reveal and analyze precious information content of ion mobility spectra. Recent advances in multidimensional coupling of ion mobility spectrometry to chromatography and mass spectrometry has created new, unique challenges for data processing, yielding high-dimensional, megavariate datasets. In this paper, a complete overview of available chemometric techniques used in the analysis of ion mobility spectrometry data is given. We describe the current state-of-the-art of ion mobility spectrometry data analysis comprising datasets with different complexities and two different scopes of data analysis, i.e. targeted and non-targeted analyte analyses. Two main steps of data analysis are considered: data preprocessing and pattern recognition. A detailed description of recent advances in chemometric techniques is provided for these steps, together with a list of interesting applications. We demonstrate that chemometric techniques have a significant contribution to the recent and great expansion of ion mobility spectrometry technology into different application fields. We conclude that well-thought out, comprehensive data analysis strategies are currently emerging, including several chemometric techniques and addressing different data challenges. In our opinion, this trend will continue in the near future, stimulating developments in ion mobility spectrometry instrumentation even further
Recommended from our members
Application of temporal streamflow descriptors in hydrologic model parameter estimation
This paper presents a parameter estimation approach based on hydrograph descriptors that capture dominant streamflow characteristics at three timescales (monthly, yearly, and record extent). The scheme, entitled hydrograph descriptors multitemporal sensitivity analyses (HYDMUS), yields an ensemble of model simulations generated from a reduced parameter space, based on a set of streamflow descriptors that emphasize the timescale dynamics of streamflow record. In this procedure the posterior distributions of model parameters derived at coarser timescales are used to sample model parameters for the next finer timescale. The procedure was used to estimate the parameters of the Sacramento soil moisture accounting model (SAC-SMA) for the Leaf River, Mississippi. The results indicated that in addition to a significant reduction in the range of parameter uncertainty, HYDMUS improved parameter identifiability for all 13 of the model parameters. The performance of the procedure was compared to four previous calibration studies on the same watershed. Although our application of HYDMUS did not explicitly consider the error at each simulation time step during the calibration process, the model performance was, in some important respects, found to be better than in previous deterministic studies. Copyright 2005 by the American Geophysical Union
- …