216,519 research outputs found

    An Efficient Alternating Riemannian/Projected Gradient Descent Ascent Algorithm for Fair Principal Component Analysis

    Full text link
    Fair principal component analysis (FPCA), a ubiquitous dimensionality reduction technique in signal processing and machine learning, aims to find a low-dimensional representation for a high-dimensional dataset in view of fairness. The FPCA problem involves optimizing a non-convex and non-smooth function over the Stiefel manifold. The state-of-the-art methods for solving the problem are subgradient methods and semidefinite relaxation-based methods. However, these two types of methods have their obvious limitations and thus are only suitable for efficiently solving the FPCA problem in special scenarios. This paper aims at developing efficient algorithms for solving the FPCA problem in general, especially large-scale, settings. In this paper, we first transform FPCA into a smooth non-convex linear minimax optimization problem over the Stiefel manifold. To solve the above general problem, we propose an efficient alternating Riemannian/projected gradient descent ascent (ARPGDA) algorithm, which performs a Riemannian gradient descent step and an ordinary projected gradient ascent step at each iteration. We prove that ARPGDA can find an Δ\varepsilon-stationary point of the above problem within O(Δ−3)\mathcal{O}(\varepsilon^{-3}) iterations. Simulation results show that, compared with the state-of-the-art methods, our proposed ARPGDA algorithm can achieve a better performance in terms of solution quality and speed for solving the FPCA problems.Comment: 5 pages, 8 figures, submitted for possible publicatio

    Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

    Full text link
    Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.Comment: 42 pages, 5 figures, 4 tables. Accepted to the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Creating Detailed Metadata for an R Shiny Analysis of Rodent Behavior Sequence Data Detected Along One Light-Dark Cycle

    Get PDF
    Automated mouse phenotyping through the high-throughput analysis of home cage behavior has brought hope of a more effective and efficient method for testing rodent models of diseases. Advanced video analysis software is able to derive behavioral sequence data sets from multiple-day recordings. However, no dedicated mechanisms exist for sharing or analyzing these types of data. In this article, we present a free, open-source software actionable through a web browser (an R Shiny application), which performs an analysis of home cage behavioral sequence data, which is designed to spot differences in circadian activity while preventing p-hacking. The software aligns time-series data to the light/dark cycle, and then uses different time windows to produce up to 162 behavior variables per animal. A principal component analysis strategy detected differences between groups. The behavior activity is represented graphically for further explorative analysis. A machine-learning approach was implemented, but it proved ineffective at separating the experimental groups. The software requires spreadsheets that provide information about the experiment (i.e., metadata), thus promoting a data management strategy that leads to FAIR data production. This encourages the publication of some metadata even when the data are kept private. We tested our software by comparing the behavior of female mice in videos recorded twice at 3 and 7 months in a home cage monitoring system. This study demonstrated that combining data management with data analysis leads to a more efficient and effective research process

    Covariance and PCA for Categorical Variables

    Full text link
    Covariances from categorical variables are defined using a regular simplex expression for categories. The method follows the variance definition by Gini, and it gives the covariance as a solution of simultaneous equations. The calculated results give reasonable values for test data. A method of principal component analysis (RS-PCA) is also proposed using regular simplex expressions, which allows easy interpretation of the principal components. The proposed methods apply to variable selection problem of categorical data USCensus1990 data. The proposed methods give appropriate criterion for the variable selection problem of categoricalComment: 12 pages, 5 figure

    Low-latency compression of mocap data using learned spatial decorrelation transform

    Full text link
    Due to the growing needs of human motion capture (mocap) in movie, video games, sports, etc., it is highly desired to compress mocap data for efficient storage and transmission. This paper presents two efficient frameworks for compressing human mocap data with low latency. The first framework processes the data in a frame-by-frame manner so that it is ideal for mocap data streaming and time critical applications. The second one is clip-based and provides a flexible tradeoff between latency and compression performance. Since mocap data exhibits some unique spatial characteristics, we propose a very effective transform, namely learned orthogonal transform (LOT), for reducing the spatial redundancy. The LOT problem is formulated as minimizing square error regularized by orthogonality and sparsity and solved via alternating iteration. We also adopt a predictive coding and temporal DCT for temporal decorrelation in the frame- and clip-based frameworks, respectively. Experimental results show that the proposed frameworks can produce higher compression performance at lower computational cost and latency than the state-of-the-art methods.Comment: 15 pages, 9 figure

    Culture, beliefs and economic performance

    Get PDF
    Beliefs are one component of culture. Data from the World Values Survey is available on a subset of beliefs concerning (broadly) meritocracy and poverty that appear relevant for economics. We document how they vary as well as their distribution across countries. We then correlate these measures of beliefs with economic growth and compare them with institutional and geographical determinants of income. A strong negative relationship is found between leftist economic beliefs and growth but little evidence is found of a relationship with respect to non-economic beliefs. Finally, we briefly discuss some causal effects on beliefs. The evidence suggests that higher country risk and more dependence on natural resources shifts nations to a more leftist set of economic beliefs. Overall the evidence supports the view that cultural specificities may explain why certain institutions cannot be transplanted between nations with different cultural histories and underlines the limit to policy activism
    • 

    corecore