28,308 research outputs found
Detection, tracking and event localization of jet stream features in 4-D atmospheric data
We introduce a novel algorithm for the efficient detection and tracking of features in spatiotemporal atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. The algorithm works on data given on a four-dimensional structured grid. Feature selection and clustering are based on adjustable local and global criteria, feature tracking is predominantly based on spatial overlaps of the feature's full volumes. The resulting 3-D features and the identified correspondences between features of consecutive time steps are represented as the nodes and edges of a directed acyclic graph, the event graph. Merging and splitting events appear in the event graph as nodes with multiple incoming or outgoing edges, respectively. The precise localization of the splitting events is based on a search for all grid points inside the initial 3-D feature that have a similar distance to two successive 3-D features of the next time step. The merging event is localized analogously, operating backward in time. As a first application of our method we present a climatology of upper-tropospheric jet streams and their events, based on four-dimensional wind speed data from European Centre for Medium-Range Weather Forecasts (ECMWF) analyses. We compare our results with a climatology from a previous study, investigate the statistical distribution of the merging and splitting events, and illustrate the meteorological significance of the jet splitting events with a case study. A brief outlook is given on additional potential applications of the 4-D data segmentation technique
Localized Lasso for High-Dimensional Regression
We introduce the localized Lasso, which is suited for learning models that
are both interpretable and have a high predictive power in problems with high
dimensionality and small sample size . More specifically, we consider a
function defined by local sparse models, one at each data point. We introduce
sample-wise network regularization to borrow strength across the models, and
sample-wise exclusive group sparsity (a.k.a., norm) to introduce
diversity into the choice of feature sets in the local models. The local models
are interpretable in terms of similarity of their sparsity patterns. The cost
function is convex, and thus has a globally optimal solution. Moreover, we
propose a simple yet efficient iterative least-squares based optimization
procedure for the localized Lasso, which does not need a tuning parameter, and
is guaranteed to converge to a globally optimal solution. The solution is
empirically shown to outperform alternatives for both simulated and genomic
personalized medicine data
Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian
The extraction of clusters from a dataset which includes multiple clusters
and a significant background component is a non-trivial task of practical
importance. In image analysis this manifests for example in anomaly detection
and target detection. The traditional spectral clustering algorithm, which
relies on the leading eigenvectors to detect clusters, fails in such
cases. In this paper we propose the {\it spectral embedding norm} which sums
the squared values of the first normalized eigenvectors, where can be
significantly larger than . We prove that this quantity can be used to
separate clusters from the background in unbalanced settings, including extreme
cases such as outlier detection. The performance of the algorithm is not
sensitive to the choice of , and we demonstrate its application on synthetic
and real-world remote sensing and neuroimaging datasets
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
- …