3,717 research outputs found

    CUSTOMER SATISFACTION MEASUREMENT MODELS: GENERALISED MAXIMUM ENTROPY APPROACH

    Get PDF
    This paper presents the methodology of the Generalised Maximum Entropy (GME) approach for estimating linear models that contain latent variables such as customer satisfaction measurement models. The GME approach is a distribution free method and it provides better alternatives to the conventional method; Namely, Partial Least Squares (PLS), which used in the context of costumer satisfaction measurement. A simplified model that is used for the Swedish customer satis faction index (CSI) have been used to generate simulated data in order to study the performance of the GME and PLS. The results showed that the GME outperforms PLS in terms of mean square errors (MSE). A simulated data also used to compute the CSI using the GME approach.Generalised Maximum Entropy, Partial Least Squares, Costumer Satisfaction Models.

    A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

    Get PDF
    Introduction: Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models. Objectives: We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis. Methods: We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks. Results: There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice. Conclusion: The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm

    Regression with Distance Matrices

    Full text link
    Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the response can be developed using standard methods. We call scoring the transformation from a new observation to a score while backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.Comment: 18 pages, 7 figure

    Latent variable regression and applications to planetary seismic instrumentation

    Get PDF
    The work presented in this thesis is framed by the concept of latent variables, a modern data analytics approach. A latent variable represents an extracted component from a dataset which is not directly measured. The concept is first applied to combat the problem of ill-posed regression through the promising method of partial least squares (PLS). In this context the latent variables within a data matrix are extracted through an iterative algorithm based on cross-covariance as an optimisation criterion. This work first extends the PLS algorithm, using adaptive and recursive techniques, for online, non-stationary data applications. The standard PLS algorithm is further generalised for complex-, quaternion- and tensor-valued data. In doing so it is shown that the multidimensional algebras facilitate physically meaningful representations, demonstrated through smart-grid frequency estimation and image-classification tasks. The second part of the thesis uses this knowledge to inform a performance analysis of the MEMS microseismometer implemented for the InSight mission to Mars. This is given in terms of the sensor's intrinsic self-noise, the estimation of which is achieved from experimental data with a colocated instrument. The standard coherence and proposed delta noise estimators are analysed with respect to practical issues. The implementation of algorithms for the alignment, calibration and post-processing of the data then enabled a definitive self-noise estimate, validated from data acquired in ultra-quiet, deep-space environment. A method for the decorrelation of the microseismometer's output from its thermal response is proposed. To do so a novel sensor fusion approach based on the Kalman filter is developed for a full-band transfer-function correction, in contrast to the traditional ill-posed frequency division method. This algorithm was applied to experimental data which determined the thermal model coefficients while validating the sensor's performance at tidal frequencies 1E-5Hz and in extreme environments at -65C. This thesis, therefore, provides a definitive view of the latent variables perspective. This is achieved through the general algorithms developed for regression with multidimensional data and the bespoke application to seismic instrumentation.Open Acces

    Intraday forecasts of a volatility index: Functional time series methods with dynamic updating

    Full text link
    As a forward-looking measure of future equity market volatility, the VIX index has gained immense popularity in recent years to become a key measure of risk for market analysts and academics. We consider discrete reported intraday VIX tick values as realisations of a collection of curves observed sequentially on equally spaced and dense grids over time and utilise functional data analysis techniques to produce one-day-ahead forecasts of these curves. The proposed method facilitates the investigation of dynamic changes in the index over very short time intervals as showcased using the 15-second high-frequency VIX index values. With the help of dynamic updating techniques, our point and interval forecasts are shown to enjoy improved accuracy over conventional time series models.Comment: 29 pages, 5 figures, To appear at the Annals of Operations Researc

    Deriving statistical inference from the application of artificial neural networks to clinical metabolomics data

    Get PDF
    Metabolomics data are complex with a high degree of multicollinearity. As such, multivariate linear projection methods, such as partial least squares discriminant analysis (PLS-DA) have become standard. Non-linear projections methods, typified by Artificial Neural Networks (ANNs) may be more appropriate to model potential nonlinear latent covariance; however, they are not widely used due to difficulty in deriving statistical inference, and thus biological interpretation. Herein, we illustrate the utility of ANNs for clinical metabolomics using publicly available data sets and develop an open framework for deriving and visualising statistical inference from ANNs equivalent to standard PLS-DA methods

    Key issues on partial least squares (PLS) in operations management research: A guide to submissions

    Get PDF
    Purpose: This work aims to systematise the use of PLS as an analysis tool via a usage guide or recommendation for researchers to help them eliminate errors when using this tool. Design/methodology/approach: A recent literature review about PLS and discussion with experts in the methodology. Findings: This article considers the current situation of PLS after intense academic debate in recent years, and summarises recommendations to properly conduct and report a research work that uses this methodology in its analyses. We particularly focus on how to: choose the construct type; choose the estimation technique (PLS or CB-SEM); evaluate and report the measurement model; evaluate and report the structural model; analyse statistical power. Research limitations: It was impossible to cover some relevant aspects in considerable detail herein: presenting a guided example that respects all the report recommendations presented herein to act as a practical guide for authors; does the specification or evaluation of the measurement model differ when it deals with first-order or second-order constructs?; how are the outcomes of the constructs interpreted with the indicators being measured with nominal measurement levels?; is the Confirmatory Composite Analysis approach compatible with recent proposals about the Confirmatory Tetrad Analysis (CTA)? These themes will the object of later publications. Originality/value: We provide a check list of the information elements that must contain any article using PLS. Our intention is for the article to act as a guide for the researchers and possible authors who send works to the JIEM (Journal of Industrial and Engineering Management). This guide could be used by both editors and reviewers of JIEM, or other journals in this area, to evaluate and reduce the risk of bias (Losilla, Oliveras, Marin-Garcia & Vives, 2018) in works using PLS as an analysis procedure

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page
    corecore