1,550 research outputs found

    Data compression and regression based on local principal curves.

    Get PDF
    Frequently the predictor space of a multivariate regression problem of the type y = m(x_1, …, x_p ) + ε is intrinsically one-dimensional, or at least of far lower dimension than p. Usual modeling attempts such as the additive model y = m_1(x_1) + … + m_p (x_p ) + ε, which try to reduce the complexity of the regression problem by making additional structural assumptions, are then inefficient as they ignore the inherent structure of the predictor space and involve complicated model and variable selection stages. In a fundamentally different approach, one may consider first approximating the predictor space by a (usually nonlinear) curve passing through it, and then regressing the response only against the one-dimensional projections onto this curve. This entails the reduction from a p- to a one-dimensional regression problem. As a tool for the compression of the predictor space we apply local principal curves. Taking things on from the results presented in Einbeck et al. (Classification – The Ubiquitous Challenge. Springer, Heidelberg, 2005, pp. 256–263), we show how local principal curves can be parametrized and how the projections are obtained. The regression step can then be carried out using any nonparametric smoother. We illustrate the technique using data from the physical sciences

    Knowledge discovery from high-frequency stream nitrate concentrations: hydrology and biology contributions

    Get PDF
    High-frequency, in-situ monitoring provides large environmental datasets. These datasets will likely bring new insights in landscape functioning and process scale understanding. However, tailoring data analysis methods is necessary. Here, we detach our analysis from the usual temporal analysis performed in hydrology to determine if it is possible to infer general rules regarding hydrochemistry from available large datasets. We combined a 2-year in-stream nitrate concentration time series (time resolution of 15 min) with concurrent hydrological, meteorological and soil moisture data. We removed the low-frequency variations through low-pass filtering, which suppressed seasonality. We then analyzed the high-frequency variability component using Pareto Density Estimation, which to our knowledge has not been applied to hydrology. The resulting distribution of nitrate concentrations revealed three normally distributed modes: low, medium and high. Studying the environmental conditions for each mode revealed the main control of nitrate concentration: the saturation state of the riparian zone. We found low nitrate concentrations under conditions of hydrological connectivity and dominant denitrifying biological processes, and we found high nitrate concentrations under hydrological recession conditions and dominant nitrifying biological processes. These results generalize our understanding of hydro-biogeochemical nitrate flux controls and bring useful information to the development of nitrogen process-based models at the landscape scale

    The architecture of emergent self-organizing maps to reduce projection errors

    Get PDF
    Abstract. There are mainly two types of Emergent Self-Organizing Maps (ESOM) grid structures in use: hexgrid (honeycomb like) and quadgrid (trellis like) maps. In addition to that, the shape of the maps may be square or rectangular. This work investigates the effects of these different map layouts. Hexgrids were found to have no convincing advantage over quadgrids. Rectangular maps, however, are distinctively superior to square maps. Most surprisingly, rectangular maps outperform square maps for isotropic data, i.e. data sets with no particular primary direction.

    Exploratory analysis of excitation-emission matrix fluorescence spectra with self-organizing maps as a basis for determination of organic matter removal efficiency at water treatment works

    Get PDF
    In the paper, the self-organizing map (SOM) was employed for the exploratory analysis of fluorescence excitation-emission data characterizing organic matter removal efficiency at 16 water treatment works in the UK. Fluorescence spectroscopy was used to assess organic matter removal efficiency between raw and partially treated (clarified) water to provide an indication of the potential for disinfection by-products formation. Fluorescence spectroscopy was utilized to evaluate quantitative and qualitative properties of organic matter removal. However, the substantial amount of fluorescence data generated impeded the interpretation process. Therefore a robust SOM technique was used to examine the fluorescence data and to reveal patterns in data distribution and correlations between organic matter properties and fluorescence variables. It was found that the SOM provided a good discrimination between water treatment sites on the base of spectral properties of organic matter. The distances between the units of the SOM map were indicative of the similarity of the fluorescence samples and thus demonstrated the relative changes in organic matter content between raw and clarified water. The higher efficiency of organic matter removal was demonstrated for the larger distances between raw and clarified samples on the map. It was also shown that organic matter removal was highly dependent on the raw water fluorescence properties, with higher efficiencies for higher emission wavelengths in visible and UV humic-like fluorescence centers

    Space-in-time and time-in-space self-organizing maps for exploring spatiotemporal patterns

    Get PDF
    Spatiotemporal data pose serious challenges to analysts in geographic and other domains. Owing to the complexity of the geospatial and temporal components, this kind of data cannot be analyzed by fully automatic methods but require the involvement of the human analyst's expertise. For a comprehensive analysis, the data need to be considered from two complementary perspectives: (1) as spatial distributions (situations) changing over time and (2) as profiles of local temporal variation distributed over space. In order to support the visual analysis of spatiotemporal data, we suggest a framework based on the “Self-Organizing Map” (SOM) method combined with a set of interactive visual tools supporting both analytic perspectives. SOM can be considered as a combination of clustering and dimensionality reduction. In the first perspective, SOM is applied to the spatial situations at different time moments or intervals. In the other perspective, SOM is applied to the local temporal evolution profiles. The integrated visual analytics environment includes interactive coordinated displays enabling various transformations of spatiotemporal data and post-processing of SOM results. The SOM matrix display offers an overview of the groupings of data objects and their two-dimensional arrangement by similarity. This view is linked to a cartographic map display, a time series graph, and a periodic pattern view. The linkage of these views supports the analysis of SOM results in both the spatial and temporal contexts. The variable SOM grid coloring serves as an instrument for linking the SOM with the corresponding items in the other displays. The framework has been validated on a large dataset with real city traffic data, where expected spatiotemporal patterns have been successfully uncovered. We also describe the use of the framework for discovery of previously unknown patterns in 41-years time series of 7 crime rate attributes in the states of the USA

    Batch kernel SOM and related Laplacian methods for social network analysis

    Get PDF
    Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts

    Emergence in Self Organizing Feature Maps

    Get PDF
    This paper sheds some light on the differences between SOM and emergent SOM (ESOM). The discussion in philosophy and epistemology about Emergence is summarized in the form of postulates. The properties of SOM are compared to these postulates. SOM fulfill most of the postulates. The epistemological postulates regarding this issue are hard, if not impossible, to prove. An alternative postulate relying on semiotic concepts, called "semiotic irreducibility" is proposed here. This concept is applied to U-Matrix on SOM with many neurons. This leads to the definition of ESOM as SOM producing a nontrivial U-Matrix on which the terms "watershed" and "catchment basin" are meaningful and which are cluster conform. The usefulness of the approach is demonstrated with an ESOM clustering algorithm which exploits the emergent properties of such SOM. Results on synthetic data also in blind studies are convincing. The application of ESOM clustering for a real world problem let to an excellent solution

    Facultative Aestivation in a Tropical Freshwater Turtle Chelodina rugosa

    Get PDF
    Abstract-1. Chelodina rugosa dug from aestivation sites at the end of the dry season were immediately alert and well coordinated. 2. Compared with non-aestivating animals, aestivating turtles had 20% higher plasma osmotic pressure and 7% higher sodium. Coupled with a small, but significant weight gain upon return to the water, this suggested the occurrence of minor dehydration in aestivating animals. 3. Plasma lactate levels of aestivating animals were low, averaging 1.99 mmol/1, consistent with aerobic rather than anaerobic metabolism having sustained their long period under ground. 4. No evidence was seen of dramatic physiological specialization. Aestivation in this species is interpreted as a primarily behavioural adaptation, made possible by typically reptilian abilities to tolerate a wide range in plasma electrolytes and to survive long periods without feeding

    Characterization of clastic sedimentary enviroments by clustering algorithm and several statistical approaches — case study, Sava Depression in Northern Croatia

    Get PDF
    Abstract This study demonstrates a method to identify and characterize some facies of turbiditic depositional environments. The study area is a hydrocarbon field in the Sava Depression (Northern Croatia). Its Upper Miocene reservoirs have been proved to represent a lacustrine turbidite system. In the workflow, first an unsupervised neural network was applied as clustering method for two sandstone reservoirs. The elements of the input vectors were the basic petrophysical parameters. In the second step autocorrelation surfaces were used to reveal the hidden anisotropy of the grid. This anisotropy is supposed to identify the main continuity directions in the geometrical analyses of sandstone bodies. Finally, in the description of clusters several parametric and nonparametric statistics were used to characterize the identified facies. Obtained results correspond to the previously published interpretation of those reservoir facies
    corecore