Search CORE

58 research outputs found

Cross-product Penalized Component Analysis (XCAN)

Author: Acar Evrim
Bro Rasmus
Camacho José
Rasmussen Morten A.
Publication venue: 'Elsevier BV'
Publication date: 28/06/2019
Field of study

Matrix factorization methods are extensively employed to understand complex data. In this paper, we introduce the cross-product penalized component analysis (XCAN), a sparse matrix factorization based on the optimization of a loss function that allows a trade-off between variance maximization and structural preservation. The approach is based on previous developments, notably (i) the Sparse Principal Component Analysis (SPCA) framework based on the LASSO, (ii) extensions of SPCA to constrain both modes of the factorization, like co-clustering or the Penalized Matrix Decomposition (PMD), and (iii) the Group-wise Principal Component Analysis (GPCA) method. The result is a flexible modeling approach that can be used for data exploration in a large variety of problems. We demonstrate its use with applications from different disciplines

arXiv.org e-Print Archive

Copenhagen University Research Information System

Recommended from our members

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

Author: Leday GGR
van de Wiel MA
Veerman JR
Publication venue: Communications in Statistics: Simulation and Computation
Publication date: 01/01/2022
Field of study

For high-dimensional linear regression models, we review and compare several estimators of variances τ2 and σ2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h2, often used in genetics. Several estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of the high-dimensional covariate matrix such as multi-collinear covariates and data-derived ones. Moreover, we study robustness against model misspecifications such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, shown to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results.Gwenaël Leday was supported by the Medical Research Council, grant number MR/M004421

Apollo (Cambridge)

Optimal Price Regulation for Natural and Legal Monopolies

Author: Ingo Vogelsang
Publication venue
Publication date
Field of study

Optimal price regulation for natural and legal monopolies is an impossible task. The still difficult .task of good price regulation can be systematized by considering separately price level and price structure of the regulated firm. Various methods of price level and price structure regulation are evaluated and then considered for the regulation of electricity transmission, both in the context of an independent transmission company and of vertical integration between transmission and most of the generation capacity. The regulatory approach suggested uses price caps defined on two-part tariffs. This way, flexibility for short-term capacity utilization can be combined with incentives for investments in new transmission capacity.

Research Papers in Economics

Recommended from our members

Varying-Coefficient Models and Functional Data Analyses for Dynamic Networks and Wearable Device Data

Author: Lee Jihui
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

As more data are observed over time, investigating the variation across time has become a vital part of analyzing such data. In this dissertation, we discuss varying-coefficient models and functional data analysis methods for temporally heterogenous data. More specifically, we examine two different types of temporal heterogeneity. The first type of temporal heterogeneity stems from temporal evolution of relational pattern over time. Dynamic networks are commonly used when relational data are observed over time. Unlike static network analysis, dynamic network analysis emphasizes the importance of recognizing temporal evolution of relationship among observations. We propose and investigate a family of dynamic network models, known as varying-coefficient exponential random graph model (VCERGM), that characterize the evolution of network topology through smoothly varying parameters. The VCERGM directly provides an interpretable dynamic network model that enables the inference of temporal heterogeneity in dynamic networks. Furthermore, we introduce a method that analyzes multilevel dynamic networks. If there exist multiple relational data observed at one time point, it is reasonable to additionally consider the variability among the repeated observations at each time point. The proposed method is an extension of stochastic blockmodels with a priori block membership and temporal random effects. It incorporates a variability among multiple relational structures at one time point and provides a richer representation of dependent engagement patterns at each time point. The method is also flexible in analyzing networks with time-varying networks. Its smooth parameters can be interpreted as evolving strength of engagement within and across blocks. The second type of temporal heterogeneity is motivated by temporal shifts in continuously observed data. When multiple curves are obtained and there exists a common curvature shared by all the observed curves, understanding the common curvature may involve a preprocessing step of managing temporal shifts among curves. We explore the properties of continuous in-shoe sensor recordings to understand the source of variability in gait data. Our case study is based on measurements of three healthy subjects. The in-shoe sensor data we explore show both phase and amplitude variabilities; we separate these sources via curve registration. We examine the correlation of temporal shifts across sensors to evaluate the pattern of phase variability shared across sensors. We apply a series of functional data analysis approaches to the registered in-shoe sensor curves to examine their association with current gold-standard gait measurement, so called ground reaction force

Columbia University Academic Commons

Forward Selection Component Analysis: Algorithms and Applications

Author: McLoone Sean
Puggini Luca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2017
Field of study

Queen's University Belfast Research Portal

Crossref

Learning nonparametric DAGs with incremental information via high-order HSIC

Author: Liu Jianguo
Wang Yafei
Publication venue
Publication date: 14/09/2023
Field of study

Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being

{\rm\bf d=40}

, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm

arXiv.org e-Print Archive

Improved fMRI-based Pain Prediction using Bayesian Group-wise Functional Registration

Author: Datta Abhirup
Lindquist Martin A.
Wang Guoqing
Publication venue
Publication date: 15/09/2022
Field of study

In recent years, neuroimaging has undergone a paradigm shift, moving away from the traditional brain mapping approach toward developing integrated, multivariate brain models that can predict categories of mental events. However, large interindividual differences in brain anatomy and functional localization after standard anatomical alignment remain a major limitation in performing this analysis, as it leads to feature misalignment across subjects in subsequent predictive models

arXiv.org e-Print Archive

Stratified Staged Trees: Modelling, Software and Applications

Author: Carli Federico
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 22/10/2021
Field of study

The thesis is focused on Probabilistic Graphical Models (PGMs), which are a rich framework for encoding probability distributions over complex domains. In particular, joint multivariate distributions over large numbers of random variables that interact with each other can be investigated through PGMs and conditional independence statements can be succinctly represented with graphical representations. These representations sit at the intersection of statistics and computer science, relying on concepts mainly from probability theory, graph algorithms and machine learning. They are applied in a wide variety of fields, such as medical diagnosis, image understanding, speech recognition, natural language processing, and many more. Over the years theory and methodology have developed and been extended in a multitude of directions. In particular, in this thesis different aspects of new classes of PGMs called Staged Trees and Chain Event Graphs (CEGs) are studied. In some sense, Staged Trees are a generalization of Bayesian Networks (BNs). Indeed, BNs provide a transparent graphical tool to define a complex process in terms of conditional independent structures. Despite their strengths in allowing for the reduction in the dimensionality of joint probability distributions of the statistical model and in providing a transparent framework for causal inference, BNs are not optimal GMs in all situations. The biggest problems with their usage mainly occur when the event space is not a simple product of the sample spaces of the random variables of interest, and when conditional independence statements are true only under certain values of variables. This happens when there are context-specific conditional independence structures. Some extensions to the BN framework have been proposed to handle these issues: context-specific BNs, Bayesian Multinets, or Similarity Networks citep{geiger1996knowledge}. These adopt a hypothesis variable to encode the context-specific statements over a particular set of random variables. For each value taken by the hypothesis variable the graphical modeller has to construct a particular BN model called local network. The collection of these local networks constitute a Bayesian Multinet, Probabilistic Decision Graphs, among others. It has been showed that Chain Event Graph (CEG) models encompass all discrete BN models and its discrete variants described above as a special subclass and they are also richer than Probabilistic Decision Graphs whose semantics is actually somewhat distinct. Unlike most of its competitors, CEGs can capture all (also context-specific) conditional independences in a unique graph, obtained by a coalescence over the vertices of an appropriately constructed probability tree, called Staged Tree. CEGs have been developed for categorical variables and have been used for cohort studies, causal analysis and case-control studies. The user\u2019s toolbox to efficiently and effectively perform uncertainty reasoning with CEGs further includes methods for inference and probability propagation, the exploration of equivalence classes and robustness studies. The main contributions of this thesis to the literature on Staged Trees are related to Stratified Staged Trees with a keen eye of application. Few observations are made on non-Stratified Staged Trees in the last part of the thesis. A core output of the thesis is an R software package which efficiently implements a host of functions for learning and estimating Staged Trees from data, relying on likelihood principles. Also structural learning algorithms based on distance or divergence between pair of categorical probability distributions and based on the clusterization of probability distributions in a fixed number of stages for each stratum of the tree are developed. Also a new class of Directed Acyclic Graph has been introduced, named Asymmetric-labeled DAG (ALDAG), which gives a BN representation of a given Staged Tree. The ALDAG is a minimal DAG such that the statistical model embedded in the Staged Tree is contained in the one associated to the ALDAG. This is possible thanks to the use of colored edges, so that each color indicates a different type of conditional dependence: total, context-specific, partial or local. Staged Trees are also adopted in this thesis as a statistical tool for classification purpose. Staged Tree Classifiers are introduced, which exhibit comparable predictive results based on accuracy with respect to algorithms from state of the art of machine learning such as neural networks and random forests. At last, algorithms to obtain an ordering of variables for the construction of the Staged Tree are designed

Archivio istituzionale della ricerca - Università di Genova

Recommended from our members

Wavelet and Multiscale Methods

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2004
Field of study

[no abstract available

Repositorium für Naturwissenschaften und Technik

Time and frequency domain statistical methods for high-frequency time series

Author: Elayouty Amira Sherif Mohamed
Publication venue
Publication date: 01/01/2017
Field of study

Advances in sensor technology enable environmental monitoring programmes to record and store measurements at high-temporal resolution over long time periods. These large volumes of high-frequency data promote an increasingly comprehensive picture of many environmental processes that would not have been accessible in the past with monthly, fortnightly or even daily sampling. However, benefiting from these increasing amounts of high-frequency data presents various challenges in terms of data processing and statistical modeling using standard methods and software tools. These challenges are attributed to the large volumes of data, the persistent and long memory serial correlation in the data, the signal to noise ratio, and the complex and time-varying dynamics and inter-relationships between the different drivers of the process at different timescales. This thesis aims at using and developing a variety of statistical methods in both the time and frequency domains to effectively explore and analyze high-frequency time series data as well as to reduce their dimensionality, with specific application to a 3 year hydrological time series. Firstly, the thesis investigates the statistical challenges of exploring, modeling and analyzing these large volumes of high-frequency time series. Thereafter, it uses and develops more advanced statistical techniques to: (i) better visualize and identify the different modes of variability and common patterns in such data, and (ii) provide a more adequate dimension reduction representation to the data, which takes into account the persistent serial dependence structure and non-stationarity in the series. Throughout the thesis, a 15-minute resolution time series of excess partial pressure of carbon dioxide (EpCO2) obtained for a small catchment in the River Dee in Scotland has been used as an illustrative data set. Understanding the bio-geochemical and hydrological drivers of EpCO 2 is very important to the assessment of the global carbon budget. Specifically, Chapters 1 and 2 present a range of advanced statistical approaches in both the time and frequency domains, including wavelet analysis and additive models, to visualize and explore temporal variations and relationships between variables for the River Dee data across the different timescales to investigate the statistical challenges posed by such data. In Chapter 3, a functional data analysis approach is employed to identify the common daily patterns of EpCO2 by means of functional principal component analysis and functional cluster analysis. The techniques used in this chapter assume independent functional data. However, in numerous applications, functional observations are serially correlated over time, e.g. where each curve represents a segment of the whole time interval. In this situation, ignoring the temporal dependence may result in an inappropriate dimension reduction of the data and inefficient inference procedures. Subsequently, the dynamic functional principal components, recently developed by Hor mann et al. (2014), are considered in Chapter 4 to account for the temporal correlation using a frequency domain approach. A specific contribution of this thesis is the extension of the methodology of dynamic functional principal components to temporally dependent functional data estimated using any type of basis functions, not only orthogonal basis functions. Based on the scores of the proposed general version of dynamic functional principal components, a novel clustering approach is proposed and used to cluster the daily curves of EpCO2 taking into account the dependence structure in the data. The dynamic functional principal components depend in their construction on the assumption of second-order stationarity, which is not a realistic assumption in most environmental applications. Therefore, in Chapter 5, a second specific contribution of this thesis is the development of a time-varying dynamic functional principal components which allows the components to vary smoothly over time. The performance of these smooth dynamic functional principal components is evaluated empirically using the EpCO2 data and using a simulation study. The simulation study compares the performance of smooth and original dynamic functional principal components under both stationary and non-stationary conditions. The smooth dynamic functional principal components have shown considerable improvement in representing non-stationary dependent functional data in smaller dimensions. Using a bootstrap inference procedure, the smooth dynamic functional principal components have been subsequently employed to investigate whether or not the spectral density and covariance structure of the functional time series under study change over time. To account for the possible changes in the covariance structure, a clustering approach based on the proposed smooth dynamic functional principal components is suggested and the results of application are discussed. Finally, Chapter 6 provides a summary of the work presented within this thesis, discusses the limitations and implications and proposes areas for future research

Glasgow Theses Service