24,595 research outputs found
The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams
Stream mining poses unique challenges to machine learning: predictive models
are required to be scalable, incrementally trainable, must remain bounded in
size (even when the data stream is arbitrarily long), and be nonparametric in
order to achieve high accuracy even in complex and dynamic environments.
Moreover, the learning system must be parameterless ---traditional tuning
methods are problematic in streaming settings--- and avoid requiring prior
knowledge of the number of distinct class labels occurring in the stream. In
this paper, we introduce a new algorithmic approach for nonparametric learning
in data streams. Our approach addresses all above mentioned challenges by
learning a model that covers the input space using simple local classifiers.
The distribution of these classifiers dynamically adapts to the local (unknown)
complexity of the classification problem, thus achieving a good balance between
model complexity and predictive accuracy. We design four variants of our
approach of increasing adaptivity. By means of an extensive empirical
evaluation against standard nonparametric baselines, we show state-of-the-art
results in terms of accuracy versus model size. For the variant that imposes a
strict bound on the model size, we show better performance against all other
methods measured at the same model size value. Our empirical analysis is
complemented by a theoretical performance guarantee which does not rely on any
stochastic assumption on the source generating the stream
Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis
The flare-productivity of an active region is observed to be related to its
spatial complexity. Mount Wilson or McIntosh sunspot classifications measure
such complexity but in a categorical way, and may therefore not use all the
information present in the observations. Moreover, such categorical schemes
hinder a systematic study of an active region's evolution for example. We
propose fine-scale quantitative descriptors for an active region's complexity
and relate them to the Mount Wilson classification. We analyze the local
correlation structure within continuum and magnetogram data, as well as the
cross-correlation between continuum and magnetogram data. We compute the
intrinsic dimension, partial correlation, and canonical correlation analysis
(CCA) of image patches of continuum and magnetogram active region images taken
from the SOHO-MDI instrument. We use masks of sunspots derived from continuum
as well as larger masks of magnetic active regions derived from the magnetogram
to analyze separately the core part of an active region from its surrounding
part. We find the relationship between complexity of an active region as
measured by Mount Wilson and the intrinsic dimension of its image patches.
Partial correlation patterns exhibit approximately a third-order Markov
structure. CCA reveals different patterns of correlation between continuum and
magnetogram within the sunspots and in the region surrounding the sunspots.
These results also pave the way for patch-based dictionary learning with a view
towards automatic clustering of active regions.Comment: Accepted for publication in the Journal of Space Weather and Space
Climate (SWSC). 23 pages, 11 figure
Tetkik: Akan veri kümeleme algoritmalarını çalıştırma ve karşılaştırma
12th Turkish National Software Engineering Symposium, UYMS 2018; Istanbul; Turkey; 10 September 2018 through 12 September 2018Recently, clustering data streams have become an incredibly important research area for knowledge discovery as applications produce more and more unstoppable streaming data. In this paper we introduce clustering, streams and data streaming clustering algorithms, as well as discussions of the most important stream clustering algorithms, considering their structure. As an additional contribution of our work and differently from review and survey papers in stream clustering, we offer the practical part of the most known stream clustering algorithms, namely: (i) CluStream; (ii) DenStream; (iii) D-Stream; and (iv) ClusTree, showing their experimental results along with some performance metrics computation of for each, depending on MOA framework.Son zamanlarda, veri akışlarını kümelemek uygulamalar daha fazla
durdurulamaz veri akışı üretirken bilgi keşfi için inanılmaz derecede önemli bir
araştırma alanı haline gelmiştir.Bu makalede, kümeleme, akışlar ve veri
akışlarını kümeleme algoritmalarını en önemli akım kümeleme algoritmalarının
irdelenmesini yapılarını da göz önünde bulundurarak tanıtıyoruz. Çalışmamızın
ek bir katkısı ve akış kümeleme alanında yapılmış tetkit ve gözden geçirme
makalelerinden farklı olarak en bilinen akış kümeleme algoritmalarının Pratik
kısmını, yani: (i) CluStream; (ii) DenStream; (iii) D-Stream; and (iv) ClusTree,
MOA Java çerçevesine bağlı olarak, her biri için bazı performans metriklerinin
hesaplanmasıyla birlikte deney sonuçlarını göstererek sunuyoruz
Modeling meander morphodynamics over self-formed heterogeneous floodplains
This work addresses the signatures embedded in the planform geometry of meandering rivers consequent to the formation of floodplain heterogeneities as the river bends migrate. Two geomorphic features are specifically considered: scroll bars produced by lateral accretion of point bars at convex banks and oxbow lake fills consequent to neck cutoffs. The sedimentary architecture of these geomorphic units depends on the type and amount of sediment, and controls bank erodibility as the river impinges on them, favoring or contrasting the river migration. The geometry of numerically generated planforms obtained for different scenarios of floodplain heterogeneity is compared to that of natural meandering paths. Half meander metrics and spatial distribution of channel curvatures are used to disclose the complexity embedded in meandering geometry. Fourier Analysis, Principal Component Analysis, Singular Spectrum Analysis and Multivariate Singular Spectrum Analysis are used to emphasize the subtle but crucial differences which may emerge between apparently similar configurations. A closer similarity between observed and simulated planforms is attained when fully coupling flow and sediment dynamics (fully-coupled models) and when considering self-formed heterogeneities that are less erodible than the surrounding floodplain
On landmark selection and sampling in high-dimensional data analysis
In recent years, the spectral analysis of appropriately defined kernel
matrices has emerged as a principled way to extract the low-dimensional
structure often prevalent in high-dimensional data. Here we provide an
introduction to spectral methods for linear and nonlinear dimension reduction,
emphasizing ways to overcome the computational limitations currently faced by
practitioners with massive datasets. In particular, a data subsampling or
landmark selection process is often employed to construct a kernel based on
partial information, followed by an approximate spectral analysis termed the
Nystrom extension. We provide a quantitative framework to analyse this
procedure, and use it to demonstrate algorithmic performance bounds on a range
of practical approaches designed to optimize the landmark selection process. We
compare the practical implications of these bounds by way of real-world
examples drawn from the field of computer vision, whereby low-dimensional
manifold structure is shown to emerge from high-dimensional video data streams.Comment: 18 pages, 6 figures, submitted for publicatio
- …