217 research outputs found

    Coupling geometry on binary bipartite networks: hypotheses testing on pattern geometry and nestedness

    Full text link
    Upon a matrix representation of a binary bipartite network, via the permutation invariance, a coupling geometry is computed to approximate the minimum energy macrostate of a network's system. Such a macrostate is supposed to constitute the intrinsic structures of the system, so that the coupling geometry should be taken as information contents, or even the nonparametric minimum sufficient statistics of the network data. Then pertinent null and alternative hypotheses, such as nestedness, are to be formulated according to the macrostate. That is, any efficient testing statistic needs to be a function of this coupling geometry. These conceptual architectures and mechanisms are by and large still missing in community ecology literature, and rendered misconceptions prevalent in this research area. Here the algorithmically computed coupling geometry is shown consisting of deterministic multiscale block patterns, which are framed by two marginal ultrametric trees on row and column axes, and stochastic uniform randomness within each block found on the finest scale. Functionally a series of increasingly larger ensembles of matrix mimicries is derived by conforming to the multiscale block configurations. Here matrix mimicking is meant to be subject to constraints of row and column sums sequences. Based on such a series of ensembles, a profile of distributions becomes a natural device for checking the validity of testing statistics or structural indexes. An energy based index is used for testing whether network data indeed contains structural geometry. A new version block-based nestedness index is also proposed. Its validity is checked and compared with the existing ones. A computing paradigm, called Data Mechanics, and its application on one real data network are illustrated throughout the developments and discussions in this paper

    From patterned response dependency to structured covariate dependency: categorical-pattern-matching

    Get PDF
    Data generated from a system of interest typically consists of measurements from an ensemble of subjects across multiple response and covariate features, and is naturally represented by one response-matrix against one covariate-matrix. Likely each of these two matrices simultaneously embraces heterogeneous data types: continuous, discrete and categorical. Here a matrix is used as a practical platform to ideally keep hidden dependency among/between subjects and features intact on its lattice. Response and covariate dependency is individually computed and expressed through mutliscale blocks via a newly developed computing paradigm named Data Mechanics. We propose a categorical pattern matching approach to establish causal linkages in a form of information flows from patterned response dependency to structured covariate dependency. The strength of an information flow is evaluated by applying the combinatorial information theory. This unified platform for system knowledge discovery is illustrated through five data sets. In each illustrative case, an information flow is demonstrated as an organization of discovered knowledge loci via emergent visible and readable heterogeneity. This unified approach fundamentally resolves many long standing issues, including statistical modeling, multiple response, renormalization and feature selections, in data analysis, but without involving man-made structures and distribution assumptions. The results reported here enhance the idea that linking patterns of response dependency to structures of covariate dependency is the true philosophical foundation underlying data-driven computing and learning in sciences.Comment: 32 pages, 10 figures, 3 box picture

    Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics perspectives of baseball pitching dynamics

    Get PDF
    From two coupled Multiclass Classification (MCC) and Response Manifold Analytics (RMA) perspectives, we develop Categorical Exploratory Data Analysis (CEDA) on PITCHf/x database for the information content of Major League Baseball's (MLB) pitching dynamics. MCC and RMA information contents are represented by one collection of multi-scales pattern categories from mixing geometries and one collection of global-to-local geometric localities from response-covariate manifolds, respectively. These collectives shed light on the pitching dynamics and maps out uncertainty of popular machine learning approaches. On MCC setting, an indirect-distance-measure based label embedding tree leads to discover asymmetry of mixing geometries among labels' point-clouds. A selected chain of complementary covariate feature groups collectively brings out multi-order mixing geometric pattern categories. Such categories then reveal the true nature of MCC predictive inferences. On RMA setting, multiple response features couple with multiple major covariate features to demonstrate physical principles bearing manifolds with a lattice of natural localities. With minor features' heterogeneous effects being locally identified, such localities jointly weave their focal characteristics into system understanding and provide a platform for RMA predictive inferences. Our CEDA works for universal data types, adopts non-linear associations and facilitates efficient feature-selections and inferences

    A chronology of international business cycles through non-parametric decoding

    Get PDF
    This paper introduces a new empirical strategy for the characterization of business cycles. It combines non-parametric decoding methods that classify a series into expansions and recessions but does not require specification of the underlying stochastic process generating the data. It then uses network analysis to combine the signals obtained from different economic indicators to generate a unique chronology. These methods generate a record of peak and trough dates comparable, and in one sense superior, to the NBER's own chronology. The methods are then applied to 22 OECD countries to obtain a global business cycle chronology.
    • …
    corecore