4,537 research outputs found
Recurrence-based time series analysis by means of complex network methods
Complex networks are an important paradigm of modern complex systems sciences
which allows quantitatively assessing the structural properties of systems
composed of different interacting entities. During the last years, intensive
efforts have been spent on applying network-based concepts also for the
analysis of dynamically relevant higher-order statistical properties of time
series. Notably, many corresponding approaches are closely related with the
concept of recurrence in phase space. In this paper, we review recent
methodological advances in time series analysis based on complex networks, with
a special emphasis on methods founded on recurrence plots. The potentials and
limitations of the individual methods are discussed and illustrated for
paradigmatic examples of dynamical systems as well as for real-world time
series. Complex network measures are shown to provide information about
structural features of dynamical systems that are complementary to those
characterized by other methods of time series analysis and, hence,
substantially enrich the knowledge gathered from other existing (linear as well
as nonlinear) approaches.Comment: To be published in International Journal of Bifurcation and Chaos
(2011
New Approaches to Mapping Forest Conditions and Landscape Change from Moderate Resolution Remote Sensing Data across the Species-Rich and Structurally Diverse Atlantic Northern Forest of Northeastern North America
The sustainable management of forest landscapes requires an understanding of the functional relationships between management practices, changes in landscape conditions, and ecological response. This presents a substantial need of spatial information in support of both applied research and adaptive management. Satellite remote sensing has the potential to address much of this need, but forest conditions and patterns of change remain difficult to synthesize over large areas and long time periods. Compounding this problem is error in forest attribute maps and consequent uncertainty in subsequent analyses. The research described in this document is directed at these long-standing problems.
Chapter 1 demonstrates a generalizable approach to the characterization of predominant patterns of forest landscape change. Within a ~1.5 Mha northwest Maine study area, a time series of satellite-derived forest harvest maps (1973-2010) served as the basis grouping landscape units according to time series of cumulative harvest area. Different groups reflected different harvest histories, which were linked to changes in landscape composition and configuration through time series of selected landscape metrics. Time series data resolved differences in landscape change attributable to passage of the Maine Forest Practices Act, a major change in forest policy. Our approach should be of value in supporting empirical landscape research.
Perhaps the single most important source of uncertainty in the characterization of landscape conditions is over- or under-representation of class prevalence caused by prediction bias. Systematic error is similarly impactful in maps of continuous forest attributes, where regression dilution or attenuation bias causes the overestimation of low values and underestimation of high values. In both cases, patterns of error tend to produce more homogeneous characterizations of landscape conditions. Chapters 2 and 3 present a machine learning method designed to simultaneously reduce systematic and total error in continuous and categorical maps, respectively. By training support vector machines with a multi-objective genetic algorithm, attenuation bias was substantially reduced in regression models of tree species relative abundance (chapter 2), and prediction bias was effectively removed from classification models predicting tree species occurrence and forest disturbance (chapter 3). This approach is generalizable to other prediction problems, other regions, or other geospatial disciplines
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics
<p/> <p>The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original <it>k</it>-means clustering technique—the Fast, Efficient, and Scalable <it>k</it>-means algorithm (<it>FES-k</it>-means). The <it>FES-k</it>-means algorithm uses a hybrid approach that comprises the <it>k-d</it> tree data structure that enhances the nearest neighbor query, the original <it>k</it>-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original <it>k</it>-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.</p
Visualization of Very Large High-Dimensional Data Sets as Minimum Spanning Trees
The chemical sciences are producing an unprecedented amount of large,
high-dimensional data sets containing chemical structures and associated
properties. However, there are currently no algorithms to visualize such data
while preserving both global and local features with a sufficient level of
detail to allow for human inspection and interpretation. Here, we propose a
solution to this problem with a new data visualization method, TMAP, capable of
representing data sets of up to millions of data points and arbitrary high
dimensionality as a two-dimensional tree (http://tmap.gdb.tools).
Visualizations based on TMAP are better suited than t-SNE or UMAP for the
exploration and interpretation of large data sets due to their tree-like
nature, increased local and global neighborhood and structure preservation, and
the transparency of the methods the algorithm is based on. We apply TMAP to the
most used chemistry data sets including databases of molecules such as ChEMBL,
FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet
benchmark collection of data sets. We also show its broad applicability with
further examples from biology, particle physics, and literature.Comment: 33 pages, 14 figures, 1 table, supplementary information include
A Comparative Evaluation of Quantification Methods
Quantification represents the problem of predicting class distributions in a
given target set. It also represents a growing research field in supervised
machine learning, for which a large variety of different algorithms has been
proposed in recent years. However, a comprehensive empirical comparison of
quantification methods that supports algorithm selection is not available yet.
In this work, we close this research gap by conducting a thorough empirical
performance comparison of 24 different quantification methods. To consider a
broad range of different scenarios for binary as well as multiclass
quantification settings, we carried out almost 3 million experimental runs on
40 data sets. We observe that no single algorithm generally outperforms all
competitors, but identify a group of methods including the Median Sweep and the
DyS framework that perform significantly better in binary settings. For the
multiclass setting, we observe that a different, broad group of algorithms
yields good performance, including the Generalized Probabilistic Adjusted
Count, the readme method, the energy distance minimization method, the EM
algorithm for quantification, and Friedman's method. More generally, we find
that the performance on multiclass quantification is inferior to the results
obtained in the binary setting. Our results can guide practitioners who intend
to apply quantification algorithms and help researchers to identify
opportunities for future research
- …