87 research outputs found
From patterned response dependency to structured covariate dependency: categorical-pattern-matching
Data generated from a system of interest typically consists of measurements
from an ensemble of subjects across multiple response and covariate features,
and is naturally represented by one response-matrix against one
covariate-matrix. Likely each of these two matrices simultaneously embraces
heterogeneous data types: continuous, discrete and categorical. Here a matrix
is used as a practical platform to ideally keep hidden dependency among/between
subjects and features intact on its lattice. Response and covariate dependency
is individually computed and expressed through mutliscale blocks via a newly
developed computing paradigm named Data Mechanics. We propose a categorical
pattern matching approach to establish causal linkages in a form of information
flows from patterned response dependency to structured covariate dependency.
The strength of an information flow is evaluated by applying the combinatorial
information theory. This unified platform for system knowledge discovery is
illustrated through five data sets. In each illustrative case, an information
flow is demonstrated as an organization of discovered knowledge loci via
emergent visible and readable heterogeneity. This unified approach
fundamentally resolves many long standing issues, including statistical
modeling, multiple response, renormalization and feature selections, in data
analysis, but without involving man-made structures and distribution
assumptions. The results reported here enhance the idea that linking patterns
of response dependency to structures of covariate dependency is the true
philosophical foundation underlying data-driven computing and learning in
sciences.Comment: 32 pages, 10 figures, 3 box picture
Asymptotic Properties of Multi-Treatment Covariate Adaptive Randomization Procedures for Balancing Observed and Unobserved Covariates
Applications of CAR for balancing continuous covariates remain comparatively
rare, especially in multi-treatment clinical trials, and the theoretical
properties of multi-treatment CAR have remained largely elusive for decades. In
this paper, we consider a general framework of CAR procedures for
multi-treatment clinal trials which can balance general covariate features,
such as quadratic and interaction terms which can be discrete, continuous, and
mixing. We show that under widely satisfied conditions the proposed procedures
have superior balancing properties; in particular, the convergence rate of
imbalance vectors can attain the best rate for discrete covariates,
continuous covariates, or combinations of both discrete and continuous
covariates, and at the same time, the convergence rate of the imbalance of
unobserved covariates is , where is the sample size. The
general framework unifies many existing methods and related theories,
introduces a much broader class of new and useful CAR procedures, and provides
new insights and a complete picture of the properties of CAR procedures. The
favorable balancing properties lead to the precision of the treatment effect
test in the presence of a heteroscedastic linear model with dependent covariate
features. As an application, the properties of the test of treatment effect
with unobserved covariates are studied under the CAR procedures, and consistent
tests are proposed so that the test has an asymptotic precise type I error even
if the working model is wrong and covariates are unobserved in the analysis.Comment: 102 page
Open Set Domain Adaptation using Optimal Transport
We present a 2-step optimal transport approach that performs a mapping from a
source distribution to a target distribution. Here, the target has the
particularity to present new classes not present in the source domain. The
first step of the approach aims at rejecting the samples issued from these new
classes using an optimal transport plan. The second step solves the target
(class ratio) shift still as an optimal transport problem. We develop a dual
approach to solve the optimization problem involved at each step and we prove
that our results outperform recent state-of-the-art performances. We further
apply the approach to the setting where the source and target distributions
present both a label-shift and an increasing covariate (features) shift to show
its robustness.Comment: Accepted at ECML-PKDD 2020, Acknowledgements adde
Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics perspectives of baseball pitching dynamics
From two coupled Multiclass Classification (MCC) and Response Manifold
Analytics (RMA) perspectives, we develop Categorical Exploratory Data Analysis
(CEDA) on PITCHf/x database for the information content of Major League
Baseball's (MLB) pitching dynamics. MCC and RMA information contents are
represented by one collection of multi-scales pattern categories from mixing
geometries and one collection of global-to-local geometric localities from
response-covariate manifolds, respectively. These collectives shed light on the
pitching dynamics and maps out uncertainty of popular machine learning
approaches. On MCC setting, an indirect-distance-measure based label embedding
tree leads to discover asymmetry of mixing geometries among labels'
point-clouds. A selected chain of complementary covariate feature groups
collectively brings out multi-order mixing geometric pattern categories. Such
categories then reveal the true nature of MCC predictive inferences. On RMA
setting, multiple response features couple with multiple major covariate
features to demonstrate physical principles bearing manifolds with a lattice of
natural localities. With minor features' heterogeneous effects being locally
identified, such localities jointly weave their focal characteristics into
system understanding and provide a platform for RMA predictive inferences. Our
CEDA works for universal data types, adopts non-linear associations and
facilitates efficient feature-selections and inferences
Bayesian analysis of the linear reaction norm model with unknown covariate
The reaction norm model is becoming a popular approach for the analysis of G x E interactions. In a classical reaction norm model, the expression of a genotype in different environments is described as a linear function (a reaction norm) of an environmental gradient or value. A common environmental value is defined as the mean performance of all genotypes in the environment, which is typically unknown. One approximation is to estimate the mean phenotypic performance in each environment, and then treat these estimates as known covariates in the model. However, a more satisfactory alternative is to infer environmental values simultaneously with the other parameters of the model. This study describes a method and its Bayesian MCMC implementation that makes this possible. Frequentist properties of the proposed method are tested in a simulation study. Estimates of parameters of interest agree well with the true values. Further, inferences about genetic parameters from the proposed method are similar to those derived from a reaction norm model using true environmental values. On the other hand, using phenotypic means as proxies for environmental values results in poor inferences
- …