260 research outputs found

    Robust fuzzy clustering for multiple instance regression.

    Get PDF
    Multiple instance regression (MIR) operates on a collection of bags, where each bag contains multiple instances sharing an identical real-valued label. Only few instances, called primary instances, contribute to the bag labels. The remaining instances are noise and outliers observations. The goal in MIR is to identify the primary instances within each bag and learn a regression model that can predict the label of a previously unseen bag. In this thesis, we introduce an algorithm that uses robust fuzzy clustering with an appropriate distance to learn multiple linear models from a noisy feature space simultaneously. We show that fuzzy memberships are useful in allowing instances to belong to multiple models, while possibilistic memberships allow identification of the primary instances of each bag with respect to each model. We also use possibilistic memberships to identify and ignore noisy instances and determine the optimal number of regression models. We evaluate our approach on a series of synthetic data sets, remote sensing data to predict the yearly average yield of a crop and application to drug activity prediction. We show that our approach achieves higher accuracy than existing methods

    Economic and regulatory uncertainty in renewable energy system design: a review

    Get PDF
    Renewable energy is increasingly mobilizing more investment around the globe. However, there has been little attention to evaluating economic and regulatory (E&R) uncertainties, despite their enormous impact on the project cashflows. Consequently, this review analyzes, classifies, and discusses 130 articles dealing with the design of renewable energy projects under E&R uncertainties. After performing a survey and identifying the selected manuscripts, and the few previous reviews on the matter, the following innovative categorization is designed: sources of uncertainty, uncertainty characterization methods, problem formulations, solution methods, and regulatory frameworks. The classification reveals that electricity price is the most considered source of uncertainty, often alone, despite the existence of six other equally influential groups of E&R uncertainties. In addition, real options and optimization arise as the two main approaches researchers use to solve problems in energy system design. Subsequently, the following aspects of interest are discussed in depth: how modeling can be improved, which are the most influential variables, and potential lines of research. Conclusions show the necessity of modeling E&R uncertainties with currently underrepresented methods, suggest several policy recommendations, and encourage the integration of prevailing approaches.Peer ReviewedObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No Contaminant::7.2 - Per a 2030, augmentar substancialment el percentatge d’energia renovable en el con­junt de fonts d’energiaObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No ContaminantPostprint (published version

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

    Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches

    Get PDF
    Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation noise, environmental conditions, endmember variability, and data set size. Researchers have devised and investigated many models searching for robust, stable, tractable, and accurate unmixing algorithms. This paper presents an overview of unmixing methods from the time of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixing algorithms are described. Mathematical problems and potential solutions are described. Algorithm characteristics are illustrated experimentally.Comment: This work has been accepted for publication in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensin

    On Sharp Identification Regions for Regression Under Interval Data

    Get PDF
    The reliable analysis of interval data (coarsened data) is one of the most promising applications of imprecise probabilities in statistics. If one refrains from making untestable, and often materially unjustified, strong assumptions on the coarsening process, then the empirical distribution of the data is imprecise, and statistical models are, in Manski’s terms, partially identified. We first elaborate some subtle differences between two natural ways of handling interval data in the dependent variable of regression models, distinguishing between two different types of identification regions, called Sharp Marrow Region (SMR) and Sharp Collection Region (SCR) here. Focusing on the case of linear regression analysis, we then derive some fundamental geometrical properties of SMR and SCR, allowing a comparison of the regions and providing some guidelines for their canonical construction. Relying on the algebraic framework of adjunctions of two mappings between partially ordered sets, we characterize SMR as a right adjoint and as the monotone kernel of a criterion function based mapping, while SCR is indeed interpretable as the corresponding monotone hull. Finally we sketch some ideas on a compromise between SMR and SCR based on a set-domained loss function. This paper is an extended version of a shorter paper with the same title, that is conditionally accepted for publication in the Proceedings of the Eighth International Symposium on Imprecise Probability: Theories and Applications. In the present paper we added proofs and the seventh chapter with a small Monte-Carlo-Illustration, that would have made the original paper too long

    Beyond probabilities: A possibilistic framework to interpret ensemble predictions and fuse imperfect sources of information

    Get PDF
    AbstractEnsemble forecasting is widely used in medium‐range weather predictions to account for the uncertainty that is inherent in the numerical prediction of high‐dimensional, nonlinear systems with high sensitivity to initial conditions. Ensemble forecasting allows one to sample possible future scenarios in a Monte‐Carlo‐like approximation through small strategical perturbations of the initial conditions, and in some cases stochastic parametrization schemes of the atmosphere–ocean dynamical equations. Results are generally interpreted in a probabilistic manner by turning the ensemble into a predictive probability distribution. Yet, due to model bias and dispersion errors, this interpretation is often not reliable and statistical postprocessing is needed to reach probabilistic calibration. This is all the more true for extreme events which, for dynamical reasons, cannot generally be associated with a significant density of ensemble members. In this work we propose a novel approach: a possibilistic interpretation of ensemble predictions, taking inspiration from possibility theory. This framework allows us to integrate in a consistent manner other imperfect sources of information, such as the insight about the system dynamics provided by the analogue method. We thereby show that probability distributions may not be the best way to extract the valuable information contained in ensemble prediction systems, especially for large lead times. Indeed, shifting to possibility theory provides more meaningful results without the need to resort to additional calibration, while maintaining or improving skills. Our approach is tested on an imperfect version of the Lorenz '96 model, and results for extreme event prediction are compared against those given by a standard probabilistic ensemble dressing

    Advances in transfer learning methods based on computational intelligence

    Get PDF
    Traditional machine learning and data mining have made tremendous progress in many knowledge-based areas, such as clustering, classification, and regression. However, the primary assumption in all of these areas is that the training and testing data should be in the same domain and have the same distribution. This assumption is difficult to achieve in real-world applications due to the limited availability of labeled data. Associated data in different domains can be used to expand the availability of prior knowledge about future target data. In recent years, transfer learning has been used to address such cross-domain learning problems by using information from data in a related domain and transferring that data to the target task. The transfer learning methodology is utilized in this work with unsupervised and supervised learning methods. For unsupervised learning, a novel transfer-learning possibilistic c-means (TLPCM) algorithm is proposed to handle the PCM clustering problem in a domain that has insufficient data. Moreover, TLPCM overcomes the problem of differing numbers of clusters between the source and target domains. The proposed algorithm employs the historical cluster centers of the source data as a reference to guide the clustering of the target data. The experimental studies presented here were thoroughly evaluated, and they demonstrate the advantages of TLPCM in both synthetic and real-world transfer datasets. For supervised learning, a transfer learning (TL) technique is used to pre-train a CNN model on posture data and then fine-tune it on the sleep stage data. We used a ballistocardiography (BCG) bed sensor to collect both posture and sleep stage data to provide a non-invasive, in-home monitoring system that tracks changes in the subjects' health over time. The quality of sleep has a significant impact on health and life. This study adopts a hierarchical and none-hierarchical classification structure to develop an automatic sleep stage classification system using ballistocardiogram (BCG) signals. A leave-one-subject-out cross-validation (LOSO-CV) procedure is used for testing classification performance in most of the experiments. Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Deep Neural Networks DNNs are complementary in their modeling capabilities, while CNNs have the advantage of reducing frequency variations, LSTMs are good at temporal modeling. Polysomnography (PSG) data from a sleep lab was used as the ground truth for sleep stages, with the emphasis on three sleep stages, specifically, awake, rapid eye movement (REM), and non-REM sleep (NREM). Moreover, a transfer learning approach is employed with supervised learning to address the cross-resident training problem to predict early illness. We validate our method by conducting a retrospective study on three residents from TigerPlace, a retirement community in Columbia, MO, where apartments are fitted with wireless networks of motion and bed sensors. Predicting the early signs of illness in older adults by using a continuous, unobtrusive nursing home monitoring system has been shown to increase the quality of life and decrease care costs. Illness prediction is based on sensor data and uses algorithms such as support vector machine (SVM) and k-nearest neighbors (kNN). One of the most significant challenges related to the development of prediction algorithms for sensor networks is the use of knowledge from previous residents to predict new ones' behaviors. Each day, the presence or absence of illness was manually evaluated using nursing visit reports from a homegrown electronic medical record (EMR) system. In this work, the transfer learning SVM approach outperformed three other methods, i.e., regular SVM, one-class SVM, and one-class kNN.Includes bibliographical references (pages 114-127)

    Study of the dependencies between in-service degradation and key design parameters with uncertainty for mechanical components.

    Get PDF
    The design features of machine components can impact significantly in its life while in-service, and only relatively few studies which are case specific have been undertaken with respect to this. Hence, the need for more understanding of the influence of geometric design features on the service life of a machine component. The aim of this research is to develop a methodology to assess the degradation life of a mechanical component due to geometric design influence in the presence of uncertainties and its application for the optimisation of the component in the presence of these uncertainties. This thesis has proposed a novel methodology for assessing the thermal fatigue life, a degradation mechanism based on the influence of design features in the presence of uncertainties. In this research a novel uncertainty analysis methodology that is able to handle simultaneously the presence of aleatory and epistemic uncertainties is proposed for a more realistic prediction and assessment of a components thermal fatigue degradation life estimated using finite element analysis. A design optimisation method for optimising the components design in the presence of mixed uncertainty, aleatory and epistemic uncertainties is also proposed and developed. The performance of the proposed methodology is analysed through the use of passenger vehicle brake discs. The novel uncertainty quantification methodology was initially applied on a solid brake disc, and validated for generalisability using a vented brake disc which has more complex design features. While the optimisation method as proposed was applied on the vented brake disc. With these this research proposes a validated set of uncertainty and optimisation methodology in the presence of mixed uncertainties for a design problem. The methodologies proposed in this research provide design engineers with a methodology to design components that are robust by giving the design with the least uncertainty in its output as result of design parameters inherent variability while simultaneously providing the design with the least uncertainty in estimation of its life as a result of the use of surrogate models.PhD in Manufacturin
    corecore