Search CORE

19,677 research outputs found

On multi-view learning with additive models

Author: Culp Mark
Johnson Kjell
Michailidis George
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

In many scientific settings data can be naturally partitioned into variable groupings called views. Common examples include environmental (1st view) and genetic information (2nd view) in ecological applications, chemical (1st view) and biological (2nd view) data in drug discovery. Multi-view data also occur in text analysis and proteomics applications where one view consists of a graph with observations as the vertices and a weighted measure of pairwise similarity between observations as the edges. Further, in several of these applications the observations can be partitioned into two sets, one where the response is observed (labeled) and the other where the response is not (unlabeled). The problem for simultaneously addressing viewed data and incorporating unlabeled observations in training is referred to as multi-view transductive learning. In this work we introduce and study a comprehensive generalized fixed point additive modeling framework for multi-view transductive learning, where any view is represented by a linear smoother. The problem of view selection is discussed using a generalized Akaike Information Criterion, which provides an approach for testing the contribution of each view. An efficient implementation is provided for fitting these models with both backfitting and local-scoring type algorithms adjusted to semi-supervised graph-based learning. The proposed technique is assessed on both synthetic and real data sets and is shown to be competitive to state-of-the-art co-training and graph-based techniques.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS202 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

The Research Repository @ WVU (West Virginia University)

A Comparative Study of Different Kernel Functions Applied to LW-KPLS Model for Nonlinear Processes

Author: Ngu Joyce Chen Yen
Yeo Christine
Publication venue: 'AMG Transcend Association'
Publication date: 01/01/2022
Field of study

Soft sensors are inferential estimators when the employment of hardware sensors is inapplicable, expensive, or difficult in industrial plant processes. Currently, a simple soft sensor, namely locally weighted partial least squares (LW-PLS), which can cope with the nonlinearity of the process, has been developed. However, LW-PLS exhibits the disadvantages of handling strong nonlinear process data. To address this problem, Kernel functions are integrated into LW-PLS to form locally weighted Kernel partial least squares (LW-KPLS). Notice that a minimal study was carried out on the impact of different kernel functions that have not been integrated with the LW-KPLS, in which this model has the potential to be applied to different chemical-related nonlinear processes. Thus, this study investigates the predictive performance of LW-KPLS with several different Kernel functions using three nonlinear case studies. As the results, the predictive performances of LW-KPLS with Polynomial Kernel are better than other Kernel functions. The values of root-mean-square errors (RMSE) and error of approximation (Ea) for the training and testing dataset by utilizing this Kernel function are the lowest in their respective case studies, which are 34.60% to 95.39% lower for RMSEs values and 68.20% to 95.49% smaller for Ea values

espace@Curtin

Optimal management of bio-based energy supply chains under parametric uncertainty through a data-driven decision-support framework

Author: Espuña Camarasa Antonio
Lupera Calahorrano Gicela Jazmín
Medina González Sergio Armando
Shokry Abdelaleem Taha Zied Ahmed
Silvente Saiz Javier
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

This paper addresses the optimal management of a multi-objective bio-based energy supply chain network subjected to multiple sources of uncertainty. The complexity to obtain an optimal solution using traditional uncertainty management methods dramatically increases with the number of uncertain factors considered. Such a complexity produces that, if tractable, the problem is solved after a large computational effort. Therefore, in this work a data-driven decision-making framework is proposed to address this issue. Such a framework exploits machine learning techniques to efficiently approximate the optimal management decisions considering a set of uncertain parameters that continuously influence the process behavior as an input. A design of computer experiments technique is used in order to combine these parameters and produce a matrix of representative information. These data are used to optimize the deterministic multi-objective bio-based energy network problem through conventional optimization methods, leading to a detailed (but elementary) map of the optimal management decisions based on the uncertain parameters. Afterwards, the detailed data-driven relations are described/identified using an Ordinary Kriging meta-model. The result exhibits a very high accuracy of the parametric meta-models for predicting the optimal decision variables in comparison with the traditional stochastic approach. Besides, and more importantly, a dramatic reduction of the computational effort required to obtain these optimal values in response to the change of the uncertain parameters is achieved. Thus the use of the proposed data-driven decision tool promotes a time-effective optimal decision making, which represents a step forward to use data-driven strategy in large-scale/complex industrial problems.Peer ReviewedPostprint (published version

Model-based analysis of the potential of macroinvertebrates as indicators for microbial pathogens in rivers

Author: Cisneros Felipe
Cordova Vela Gonzalo
Díaz Granda Catalina
Goethals PeterLA220019900642258010011024230000-0003-1168-6776F51380BE-F0ED-11E1-A9DE-61C894A0A6B4
Hannula Emiliaeditor
Iñiguez Vela Xavier
Jerves Cobo RubénCA200001413228358020020339050000-0002-7141-2390A31009F0-B179-11E4-92DF-2F8DB5D1D7B1
Morriën Ellyeditor
Nopens IngmarLA260019942824128010013710900000-0001-6670-3700F5D054AA-F0ED-11E1-A9DE-61C894A0A6B4
Van Echelpoel WoutLA220001107360088020017940310000-0001-9636-5861380F58FC-F0EE-11E1-A9DE-61C894A0A6B4
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

The quality of water prior to its use for drinking, farming or recreational purposes must comply with several physicochemical and microbiological standards to safeguard society and the environment. In order to satisfy these standards, expensive analyses and highly trained personnel in laboratories are required. Whereas macroinvertebrates have been used as ecological indicators to review the health of aquatic ecosystems. In this research, the relationship between microbial pathogens and macrobenthic invertebrate taxa was examined in the Machangara River located in the southern Andes of Ecuador, in which 33 sites, according to their land use, were chosen to collect physicochemical, microbiological and biological parameters. Decision tree models (DTMs) were used to generate rules that link the presence and abundance of some benthic families to microbial pathogen standards. The aforementioned DTMs provide an indirect, approximate, and quick way of checking the fulfillment of Ecuadorian regulations for water use related to microbial pathogens. The models built and optimized with the WEKA package, were evaluated based on both statistical and ecological criteria to make them as clear and simple as possible. As a result, two different and reliable models were obtained, which could be used as proxy indicators in a preliminary assessment of pollution of microbial pathogens in rivers. The DTMs can be easily applied by staff with minimal training in the identification of the sensitive taxa selected by the models. The presence of selected macroinvertebrate taxa in conjunction with the decision trees can be used as a screening tool to evaluate sites that require additional follow up analyses to confirm whether microbial water quality standards are met

Directory of Open Access Journals

Archivsystem Ask23

Estimating Local Function Complexity via Mixture of Gaussian Processes

Author: Bui Thanh Binh
Müller Klaus-Robert
Nakajima Shinichi
Panknin Danny
Publication venue
Publication date: 28/08/2019
Field of study

Real world data often exhibit inhomogeneity, e.g., the noise level, the sampling distribution or the complexity of the target function may change over the input space. In this paper, we try to isolate local function complexity in a practical, robust way. This is achieved by first estimating the locally optimal kernel bandwidth as a functional relationship. Specifically, we propose Spatially Adaptive Bandwidth Estimation in Regression (SABER), which employs the mixture of experts consisting of multinomial kernel logistic regression as a gate and Gaussian process regression models as experts. Using the locally optimal kernel bandwidths, we deduce an estimate to the local function complexity by drawing parallels to the theory of locally linear smoothing. We demonstrate the usefulness of local function complexity for model interpretation and active learning in quantum chemistry experiments and fluid dynamics simulations.Comment: 19 pages, 16 figure

arXiv.org e-Print Archive