4,537 research outputs found

    Recurrence-based time series analysis by means of complex network methods

    Full text link
    Complex networks are an important paradigm of modern complex systems sciences which allows quantitatively assessing the structural properties of systems composed of different interacting entities. During the last years, intensive efforts have been spent on applying network-based concepts also for the analysis of dynamically relevant higher-order statistical properties of time series. Notably, many corresponding approaches are closely related with the concept of recurrence in phase space. In this paper, we review recent methodological advances in time series analysis based on complex networks, with a special emphasis on methods founded on recurrence plots. The potentials and limitations of the individual methods are discussed and illustrated for paradigmatic examples of dynamical systems as well as for real-world time series. Complex network measures are shown to provide information about structural features of dynamical systems that are complementary to those characterized by other methods of time series analysis and, hence, substantially enrich the knowledge gathered from other existing (linear as well as nonlinear) approaches.Comment: To be published in International Journal of Bifurcation and Chaos (2011

    New Approaches to Mapping Forest Conditions and Landscape Change from Moderate Resolution Remote Sensing Data across the Species-Rich and Structurally Diverse Atlantic Northern Forest of Northeastern North America

    Get PDF
    The sustainable management of forest landscapes requires an understanding of the functional relationships between management practices, changes in landscape conditions, and ecological response. This presents a substantial need of spatial information in support of both applied research and adaptive management. Satellite remote sensing has the potential to address much of this need, but forest conditions and patterns of change remain difficult to synthesize over large areas and long time periods. Compounding this problem is error in forest attribute maps and consequent uncertainty in subsequent analyses. The research described in this document is directed at these long-standing problems. Chapter 1 demonstrates a generalizable approach to the characterization of predominant patterns of forest landscape change. Within a ~1.5 Mha northwest Maine study area, a time series of satellite-derived forest harvest maps (1973-2010) served as the basis grouping landscape units according to time series of cumulative harvest area. Different groups reflected different harvest histories, which were linked to changes in landscape composition and configuration through time series of selected landscape metrics. Time series data resolved differences in landscape change attributable to passage of the Maine Forest Practices Act, a major change in forest policy. Our approach should be of value in supporting empirical landscape research. Perhaps the single most important source of uncertainty in the characterization of landscape conditions is over- or under-representation of class prevalence caused by prediction bias. Systematic error is similarly impactful in maps of continuous forest attributes, where regression dilution or attenuation bias causes the overestimation of low values and underestimation of high values. In both cases, patterns of error tend to produce more homogeneous characterizations of landscape conditions. Chapters 2 and 3 present a machine learning method designed to simultaneously reduce systematic and total error in continuous and categorical maps, respectively. By training support vector machines with a multi-objective genetic algorithm, attenuation bias was substantially reduced in regression models of tree species relative abundance (chapter 2), and prediction bias was effectively removed from classification models predicting tree species occurrence and forest disturbance (chapter 3). This approach is generalizable to other prediction problems, other regions, or other geospatial disciplines

    A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics

    Get PDF
    <p/> <p>The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original <it>k</it>-means clustering technique&#8212;the Fast, Efficient, and Scalable <it>k</it>-means algorithm (<it>FES-k</it>-means). The <it>FES-k</it>-means algorithm uses a hybrid approach that comprises the <it>k-d</it> tree data structure that enhances the nearest neighbor query, the original <it>k</it>-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original <it>k</it>-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.</p

    Visualization of Very Large High-Dimensional Data Sets as Minimum Spanning Trees

    Get PDF
    The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature.Comment: 33 pages, 14 figures, 1 table, supplementary information include

    A Comparative Evaluation of Quantification Methods

    Full text link
    Quantification represents the problem of predicting class distributions in a given target set. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorithm selection is not available yet. In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods. To consider a broad range of different scenarios for binary as well as multiclass quantification settings, we carried out almost 3 million experimental runs on 40 data sets. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods including the Median Sweep and the DyS framework that perform significantly better in binary settings. For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM algorithm for quantification, and Friedman's method. More generally, we find that the performance on multiclass quantification is inferior to the results obtained in the binary setting. Our results can guide practitioners who intend to apply quantification algorithms and help researchers to identify opportunities for future research
    corecore