19,677 research outputs found
On multi-view learning with additive models
In many scientific settings data can be naturally partitioned into variable
groupings called views. Common examples include environmental (1st view) and
genetic information (2nd view) in ecological applications, chemical (1st view)
and biological (2nd view) data in drug discovery. Multi-view data also occur in
text analysis and proteomics applications where one view consists of a graph
with observations as the vertices and a weighted measure of pairwise similarity
between observations as the edges. Further, in several of these applications
the observations can be partitioned into two sets, one where the response is
observed (labeled) and the other where the response is not (unlabeled). The
problem for simultaneously addressing viewed data and incorporating unlabeled
observations in training is referred to as multi-view transductive learning. In
this work we introduce and study a comprehensive generalized fixed point
additive modeling framework for multi-view transductive learning, where any
view is represented by a linear smoother. The problem of view selection is
discussed using a generalized Akaike Information Criterion, which provides an
approach for testing the contribution of each view. An efficient implementation
is provided for fitting these models with both backfitting and local-scoring
type algorithms adjusted to semi-supervised graph-based learning. The proposed
technique is assessed on both synthetic and real data sets and is shown to be
competitive to state-of-the-art co-training and graph-based techniques.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS202 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Comparative Study of Different Kernel Functions Applied to LW-KPLS Model for Nonlinear Processes
Soft sensors are inferential estimators when the employment of hardware sensors is
inapplicable, expensive, or difficult in industrial plant processes. Currently, a simple soft sensor, namely
locally weighted partial least squares (LW-PLS), which can cope with the nonlinearity of the process,
has been developed. However, LW-PLS exhibits the disadvantages of handling strong nonlinear process
data. To address this problem, Kernel functions are integrated into LW-PLS to form locally weighted
Kernel partial least squares (LW-KPLS). Notice that a minimal study was carried out on the impact of
different kernel functions that have not been integrated with the LW-KPLS, in which this model has the
potential to be applied to different chemical-related nonlinear processes. Thus, this study investigates
the predictive performance of LW-KPLS with several different Kernel functions using three nonlinear
case studies. As the results, the predictive performances of LW-KPLS with Polynomial Kernel are better
than other Kernel functions. The values of root-mean-square errors (RMSE) and error of approximation
(Ea) for the training and testing dataset by utilizing this Kernel function are the lowest in their respective
case studies, which are 34.60% to 95.39% lower for RMSEs values and 68.20% to 95.49% smaller for
Ea values
Optimal management of bio-based energy supply chains under parametric uncertainty through a data-driven decision-support framework
This paper addresses the optimal management of a multi-objective bio-based energy supply chain network subjected to multiple sources of uncertainty. The complexity to obtain an optimal solution using traditional uncertainty management methods dramatically increases with the number of uncertain factors considered. Such a complexity produces that, if tractable, the problem is solved after a large computational effort. Therefore, in this work a data-driven decision-making framework is proposed to address this issue. Such a framework exploits machine learning techniques to efficiently approximate the optimal management decisions considering a set of uncertain parameters that continuously influence the process behavior as an input. A design of computer experiments technique is used in order to combine these parameters and produce a matrix of representative information. These data are used to optimize the deterministic multi-objective bio-based energy network problem through conventional optimization methods, leading to a detailed (but elementary) map of the optimal management decisions based on the uncertain parameters. Afterwards, the detailed data-driven relations are described/identified using an Ordinary Kriging meta-model. The result exhibits a very high accuracy of the parametric meta-models for predicting the optimal decision variables in comparison with the traditional stochastic approach. Besides, and more importantly, a dramatic reduction of the computational effort required to obtain these optimal values in response to the change of the uncertain parameters is achieved. Thus the use of the proposed data-driven decision tool promotes a time-effective optimal decision making, which represents a step forward to use data-driven strategy in large-scale/complex industrial problems.Peer ReviewedPostprint (published version
Model-based analysis of the potential of macroinvertebrates as indicators for microbial pathogens in rivers
The quality of water prior to its use for drinking, farming or recreational purposes must comply with several physicochemical and microbiological standards to safeguard society and the environment. In order to satisfy these standards, expensive analyses and highly trained personnel in laboratories are required. Whereas macroinvertebrates have been used as ecological indicators to review the health of aquatic ecosystems. In this research, the relationship between microbial pathogens and macrobenthic invertebrate taxa was examined in the Machangara River located in the southern Andes of Ecuador, in which 33 sites, according to their land use, were chosen to collect physicochemical, microbiological and biological parameters. Decision tree models (DTMs) were used to generate rules that link the presence and abundance of some benthic families to microbial pathogen standards. The aforementioned DTMs provide an indirect, approximate, and quick way of checking the fulfillment of Ecuadorian regulations for water use related to microbial pathogens. The models built and optimized with the WEKA package, were evaluated based on both statistical and ecological criteria to make them as clear and simple as possible. As a result, two different and reliable models were obtained, which could be used as proxy indicators in a preliminary assessment of pollution of microbial pathogens in rivers. The DTMs can be easily applied by staff with minimal training in the identification of the sensitive taxa selected by the models. The presence of selected macroinvertebrate taxa in conjunction with the decision trees can be used as a screening tool to evaluate sites that require additional follow up analyses to confirm whether microbial water quality standards are met
Estimating Local Function Complexity via Mixture of Gaussian Processes
Real world data often exhibit inhomogeneity, e.g., the noise level, the
sampling distribution or the complexity of the target function may change over
the input space. In this paper, we try to isolate local function complexity in
a practical, robust way. This is achieved by first estimating the locally
optimal kernel bandwidth as a functional relationship. Specifically, we propose
Spatially Adaptive Bandwidth Estimation in Regression (SABER), which employs
the mixture of experts consisting of multinomial kernel logistic regression as
a gate and Gaussian process regression models as experts. Using the locally
optimal kernel bandwidths, we deduce an estimate to the local function
complexity by drawing parallels to the theory of locally linear smoothing. We
demonstrate the usefulness of local function complexity for model
interpretation and active learning in quantum chemistry experiments and fluid
dynamics simulations.Comment: 19 pages, 16 figure
- …