2,836 research outputs found

    Data complexity in machine learning

    Get PDF
    We investigate the role of data complexity in the context of binary classification problems. The universal data complexity is defined for a data set as the Kolmogorov complexity of the mapping enforced by the data set. It is closely related to several existing principles used in machine learning such as Occam's razor, the minimum description length, and the Bayesian approach. The data complexity can also be defined based on a learning model, which is more realistic for applications. We demonstrate the application of the data complexity in two learning problems, data decomposition and data pruning. In data decomposition, we illustrate that a data set is best approximated by its principal subsets which are Pareto optimal with respect to the complexity and the set size. In data pruning, we show that outliers usually have high complexity contributions, and propose methods for estimating the complexity contribution. Since in practice we have to approximate the ideal data complexity measures, we also discuss the impact of such approximations

    Polarimetric radar data decomposition and interpretation

    Get PDF
    Significant efforts have been made to decompose polarimetric radar data into several simple scattering components. The components which are selected because of their physical significance can be used to classify SAR (Synthetic Aperture Radar) image data. If particular components can be related to forest parameters, inversion procedures may be developed to estimate these parameters from the scattering components. Several methods have been used to decompose an averaged Stoke's matrix or covariance matrix into three components representing odd (surface), even (double-bounce) and diffuse (volume) scatterings. With these decomposition techniques, phenomena, such as canopy-ground interactions, randomness of orientation, and size of scatters can be examined from SAR data. In this study we applied the method recently reported by van Zyl (1992) to decompose averaged backscattering covariance matrices extracted from JPL SAR images over forest stands in Maine, USA. These stands are mostly mixed stands of coniferous and deciduous trees. Biomass data have been derived from field measurements of DBH and tree density using allometric equations. The interpretation of the decompositions and relationships with measured stand biomass are presented in this paper

    Spatio-Temporal Multiway Data Decomposition Using Principal Tensor Analysis on k-Modes: The R Package PTAk

    Get PDF
    The purpose of this paper is to describe the R package {PTAk and how the spatio-temporal context can be taken into account in the analyses. Essentially PTAk() is a multiway multidimensional method to decompose a multi-entries data-array, seen mathematically as a tensor of any order. This PTAk-modes method proposes a way of generalizing SVD (singular value decomposition), as well as some other well known methods included in the R package, such as PARAFAC or CANDECOMP and the PCAn-modes or Tucker-n model. The example datasets cover different domains with various spatio-temporal characteristics and issues: (i)~medical imaging in neuropsychology with a functional MRI (magnetic resonance imaging) study, (ii)~pharmaceutical research with a pharmacodynamic study with EEG (electro-encephaloegraphic) data for a central nervous system (CNS) drug, and (iii)~geographical information system (GIS) with a climatic dataset that characterizes arid and semi-arid variations. All the methods implemented in the R package PTAk also support non-identity metrics, as well as penalizations during the optimization process. As a result of these flexibilities, together with pre-processing facilities, PTAk constitutes a framework for devising extensions of multidimensional methods such ascorrespondence analysis, discriminant analysis, and multidimensional scaling, also enabling spatio-temporal constraints.

    A UNIFIED MODEL OF SOMWARE AND DATA DECOMPOSITION

    Get PDF
    Decomposition is an important part of information systems analysis and design and is manifested as the breakdown of the system to elements such as subsystems, modules, activities, processes, entities, and objects. Good decomposition is considered a major requirement for a good system design. However, there is no comprehensive theory of information systems decomposition and no single dominant decomposition approach exists. Consequently, software decomposition relies on guidelines and designer\u27s experience. In this article, we propose a foundation for a theory of good decomposition based on two principles: 1) the decomposition of an information system should reflect the nature of the real world system represented by it, and 2) static and dynamic aspects of systems cannot be separated and hence good decomposition should be based on both. The model enables the analysis of concepts such as good software modules, normalized relations, objects, and entities as special cases of one generalized construct

    Large-scale Geometric Data Decomposition, Processing and Structured Mesh Generation

    Get PDF
    Mesh generation is a fundamental and critical problem in geometric data modeling and processing. In most scientific and engineering tasks that involve numerical computations and simulations on 2D/3D regions or on curved geometric objects, discretizing or approximating the geometric data using a polygonal or polyhedral meshes is always the first step of the procedure. The quality of this tessellation often dictates the subsequent computation accuracy, efficiency, and numerical stability. When compared with unstructured meshes, the structured meshes are favored in many scientific/engineering tasks due to their good properties. However, generating high-quality structured mesh remains challenging, especially for complex or large-scale geometric data. In industrial Computer-aided Design/Engineering (CAD/CAE) pipelines, the geometry processing to create a desirable structural mesh of the complex model is the most costly step. This step is semi-manual, and often takes up to several weeks to finish. Several technical challenges remains unsolved in existing structured mesh generation techniques. This dissertation studies the effective generation of structural mesh on large and complex geometric data. We study a general geometric computation paradigm to solve this problem via model partitioning and divide-and-conquer. To apply effective divide-and-conquer, we study two key technical components: the shape decomposition in the divide stage, and the structured meshing in the conquer stage. We test our algorithm on vairous data set, the results demonstrate the efficiency and effectiveness of our framework. The comparisons also show our algorithm outperforms existing partitioning methods in final meshing quality. We also show our pipeline scales up efficiently on HPC environment
    • ā€¦
    corecore