217 research outputs found
Coupling geometry on binary bipartite networks: hypotheses testing on pattern geometry and nestedness
Upon a matrix representation of a binary bipartite network, via the
permutation invariance, a coupling geometry is computed to approximate the
minimum energy macrostate of a network's system. Such a macrostate is supposed
to constitute the intrinsic structures of the system, so that the coupling
geometry should be taken as information contents, or even the nonparametric
minimum sufficient statistics of the network data. Then pertinent null and
alternative hypotheses, such as nestedness, are to be formulated according to
the macrostate. That is, any efficient testing statistic needs to be a function
of this coupling geometry. These conceptual architectures and mechanisms are by
and large still missing in community ecology literature, and rendered
misconceptions prevalent in this research area. Here the algorithmically
computed coupling geometry is shown consisting of deterministic multiscale
block patterns, which are framed by two marginal ultrametric trees on row and
column axes, and stochastic uniform randomness within each block found on the
finest scale. Functionally a series of increasingly larger ensembles of matrix
mimicries is derived by conforming to the multiscale block configurations. Here
matrix mimicking is meant to be subject to constraints of row and column sums
sequences. Based on such a series of ensembles, a profile of distributions
becomes a natural device for checking the validity of testing statistics or
structural indexes. An energy based index is used for testing whether network
data indeed contains structural geometry. A new version block-based nestedness
index is also proposed. Its validity is checked and compared with the existing
ones. A computing paradigm, called Data Mechanics, and its application on one
real data network are illustrated throughout the developments and discussions
in this paper
From patterned response dependency to structured covariate dependency: categorical-pattern-matching
Data generated from a system of interest typically consists of measurements
from an ensemble of subjects across multiple response and covariate features,
and is naturally represented by one response-matrix against one
covariate-matrix. Likely each of these two matrices simultaneously embraces
heterogeneous data types: continuous, discrete and categorical. Here a matrix
is used as a practical platform to ideally keep hidden dependency among/between
subjects and features intact on its lattice. Response and covariate dependency
is individually computed and expressed through mutliscale blocks via a newly
developed computing paradigm named Data Mechanics. We propose a categorical
pattern matching approach to establish causal linkages in a form of information
flows from patterned response dependency to structured covariate dependency.
The strength of an information flow is evaluated by applying the combinatorial
information theory. This unified platform for system knowledge discovery is
illustrated through five data sets. In each illustrative case, an information
flow is demonstrated as an organization of discovered knowledge loci via
emergent visible and readable heterogeneity. This unified approach
fundamentally resolves many long standing issues, including statistical
modeling, multiple response, renormalization and feature selections, in data
analysis, but without involving man-made structures and distribution
assumptions. The results reported here enhance the idea that linking patterns
of response dependency to structures of covariate dependency is the true
philosophical foundation underlying data-driven computing and learning in
sciences.Comment: 32 pages, 10 figures, 3 box picture
Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics perspectives of baseball pitching dynamics
From two coupled Multiclass Classification (MCC) and Response Manifold
Analytics (RMA) perspectives, we develop Categorical Exploratory Data Analysis
(CEDA) on PITCHf/x database for the information content of Major League
Baseball's (MLB) pitching dynamics. MCC and RMA information contents are
represented by one collection of multi-scales pattern categories from mixing
geometries and one collection of global-to-local geometric localities from
response-covariate manifolds, respectively. These collectives shed light on the
pitching dynamics and maps out uncertainty of popular machine learning
approaches. On MCC setting, an indirect-distance-measure based label embedding
tree leads to discover asymmetry of mixing geometries among labels'
point-clouds. A selected chain of complementary covariate feature groups
collectively brings out multi-order mixing geometric pattern categories. Such
categories then reveal the true nature of MCC predictive inferences. On RMA
setting, multiple response features couple with multiple major covariate
features to demonstrate physical principles bearing manifolds with a lattice of
natural localities. With minor features' heterogeneous effects being locally
identified, such localities jointly weave their focal characteristics into
system understanding and provide a platform for RMA predictive inferences. Our
CEDA works for universal data types, adopts non-linear associations and
facilitates efficient feature-selections and inferences
A chronology of international business cycles through non-parametric decoding
This paper introduces a new empirical strategy for the characterization of business cycles. It combines non-parametric decoding methods that classify a series into expansions and recessions but does not require specification of the underlying stochastic process generating the data. It then uses network analysis to combine the signals obtained from different economic indicators to generate a unique chronology. These methods generate a record of peak and trough dates comparable, and in one sense superior, to the NBER's own chronology. The methods are then applied to 22 OECD countries to obtain a global business cycle chronology.
- …