Search CORE

87 research outputs found

From patterned response dependency to structured covariate dependency: categorical-pattern-matching

Author: Fushing Hsieh
Hsieh Yin-Chen
Liu Shan-Yu
McCowan Brenda
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/05/2017
Field of study

Data generated from a system of interest typically consists of measurements from an ensemble of subjects across multiple response and covariate features, and is naturally represented by one response-matrix against one covariate-matrix. Likely each of these two matrices simultaneously embraces heterogeneous data types: continuous, discrete and categorical. Here a matrix is used as a practical platform to ideally keep hidden dependency among/between subjects and features intact on its lattice. Response and covariate dependency is individually computed and expressed through mutliscale blocks via a newly developed computing paradigm named Data Mechanics. We propose a categorical pattern matching approach to establish causal linkages in a form of information flows from patterned response dependency to structured covariate dependency. The strength of an information flow is evaluated by applying the combinatorial information theory. This unified platform for system knowledge discovery is illustrated through five data sets. In each illustrative case, an information flow is demonstrated as an organization of discovered knowledge loci via emergent visible and readable heterogeneity. This unified approach fundamentally resolves many long standing issues, including statistical modeling, multiple response, renormalization and feature selections, in data analysis, but without involving man-made structures and distribution assumptions. The results reported here enhance the idea that linking patterns of response dependency to structures of covariate dependency is the true philosophical foundation underlying data-driven computing and learning in sciences.Comment: 32 pages, 10 figures, 3 box picture

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

eScholarship - University of California

FigShare

Asymptotic Properties of Multi-Treatment Covariate Adaptive Randomization Procedures for Balancing Observed and Unobserved Covariates

Author: Zhang Li-Xin
Publication venue
Publication date: 23/05/2023
Field of study

Applications of CAR for balancing continuous covariates remain comparatively rare, especially in multi-treatment clinical trials, and the theoretical properties of multi-treatment CAR have remained largely elusive for decades. In this paper, we consider a general framework of CAR procedures for multi-treatment clinal trials which can balance general covariate features, such as quadratic and interaction terms which can be discrete, continuous, and mixing. We show that under widely satisfied conditions the proposed procedures have superior balancing properties; in particular, the convergence rate of imbalance vectors can attain the best rate

O_P(1)

for discrete covariates, continuous covariates, or combinations of both discrete and continuous covariates, and at the same time, the convergence rate of the imbalance of unobserved covariates is

O_P(\sqrt n)

, where

n

is the sample size. The general framework unifies many existing methods and related theories, introduces a much broader class of new and useful CAR procedures, and provides new insights and a complete picture of the properties of CAR procedures. The favorable balancing properties lead to the precision of the treatment effect test in the presence of a heteroscedastic linear model with dependent covariate features. As an application, the properties of the test of treatment effect with unobserved covariates are studied under the CAR procedures, and consistent tests are proposed so that the test has an asymptotic precise type I error even if the working model is wrong and covariates are unobserved in the analysis.Comment: 102 page

arXiv.org e-Print Archive

Open Set Domain Adaptation using Optimal Transport

Author: Alaya Mokhtar Z.
Gasso Gilles
Hérault Romain
Kechaou Marwa
Publication venue
Publication date: 14/09/2020
Field of study

We present a 2-step optimal transport approach that performs a mapping from a source distribution to a target distribution. Here, the target has the particularity to present new classes not present in the source domain. The first step of the approach aims at rejecting the samples issued from these new classes using an optimal transport plan. The second step solves the target (class ratio) shift still as an optimal transport problem. We develop a dual approach to solve the optimization problem involved at each step and we prove that our results outperform recent state-of-the-art performances. We further apply the approach to the setting where the source and target distributions present both a label-shift and an increasing covariate (features) shift to show its robustness.Comment: Accepted at ECML-PKDD 2020, Acknowledgements adde

arXiv.org e-Print Archive

HAL - Normandie Université

Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics perspectives of baseball pitching dynamics

Author: Chou Elizabeth P.
Hsieh Fushing
Publication venue: 'MDPI AG'
Publication date: 25/06/2020
Field of study

From two coupled Multiclass Classification (MCC) and Response Manifold Analytics (RMA) perspectives, we develop Categorical Exploratory Data Analysis (CEDA) on PITCHf/x database for the information content of Major League Baseball's (MLB) pitching dynamics. MCC and RMA information contents are represented by one collection of multi-scales pattern categories from mixing geometries and one collection of global-to-local geometric localities from response-covariate manifolds, respectively. These collectives shed light on the pitching dynamics and maps out uncertainty of popular machine learning approaches. On MCC setting, an indirect-distance-measure based label embedding tree leads to discover asymmetry of mixing geometries among labels' point-clouds. A selected chain of complementary covariate feature groups collectively brings out multi-order mixing geometric pattern categories. Such categories then reveal the true nature of MCC predictive inferences. On RMA setting, multiple response features couple with multiple major covariate features to demonstrate physical principles bearing manifolds with a lattice of natural localities. With minor features' heterogeneous effects being locally identified, such localities jointly weave their focal characteristics into system understanding and provide a platform for RMA predictive inferences. Our CEDA works for universal data types, adopts non-linear associations and facilitates efficient feature-selections and inferences

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

PubMed Central

Bayesian analysis of the linear reaction norm model with unknown covariate

Author: Jensen J.
Korsgaard I.R.
Lund M.S.
Madsen P.
Su G.
Sørensen D.
Publication venue
Publication date: 01/01/2006
Field of study

The reaction norm model is becoming a popular approach for the analysis of G x E interactions. In a classical reaction norm model, the expression of a genotype in different environments is described as a linear function (a reaction norm) of an environmental gradient or value. A common environmental value is defined as the mean performance of all genotypes in the environment, which is typically unknown. One approximation is to estimate the mean phenotypic performance in each environment, and then treat these estimates as known covariates in the model. However, a more satisfactory alternative is to infer environmental values simultaneously with the other parameters of the model. This study describes a method and its Bayesian MCMC implementation that makes this possible. Frequentist properties of the proposed method are tested in a simulation study. Estimates of parameters of interest agree well with the true values. Further, inferences about genetic parameters from the proposed method are similar to those derived from a reaction norm model using true environmental values. On the other hand, using phenotypic means as proxies for environmental values results in poor inferences

Organic Eprints