422 research outputs found
Fell bundles associated to groupoid morphisms
Given a continuous open surjective morphism of \'etale
groupoids with amenable kernel, we construct a Fell bundle over and
prove that its C*-algebra is isomorphic to . This is
related to results of Fell concerning C*-algebraic bundles over groups. The
case , a locally compact space, was treated earlier by Ramazan. We
conclude that is strongly Morita equivalent to a crossed product,
the C*-algebra of a Fell bundle arising from an action of the groupoid on a
C*-bundle over . We apply the theory to groupoid morphisms obtained from
extensions of dynamical systems and from morphisms of directed graphs with the
path lifting property. We also prove a structure theorem for abelian Fell
bundles.Comment: 12 pages, revised version, references added; to appear in Mathematica
Scandinavic
Cluster analysis for physical oceanographic data and oceanographic surveys in Turkish seas
Cluster analysis is a useful data mining method to obtain detailed information on the physical state of the ocean. The primary objective of this study is the development of a new spatio-temporal density-based algorithm for clustering physical oceanographic data. This study extends the regular spatial cluster analysis to deal with spatial data at different epochs. It also presents the sensitivity of the new algorithm to different parameter settings. The purpose of the sensitivity analysis presented in this paper is to identify the response of the algorithm to variations in input parameter values and boundary conditions. In order to demonstrate the usage of the new algorithm, this paper presents two oceanographic applications that cluster the sea-surface temperature (SST) and the sea-surface height residual (SSH) data which records the satellite observations of the Turkish Seas. It also evaluates and justifies the clustering results by using a cluster validation technique
An Impossibility Result for High Dimensional Supervised Learning
We study high-dimensional asymptotic performance limits of binary supervised
classification problems where the class conditional densities are Gaussian with
unknown means and covariances and the number of signal dimensions scales faster
than the number of labeled training samples. We show that the Bayes error,
namely the minimum attainable error probability with complete distributional
knowledge and equally likely classes, can be arbitrarily close to zero and yet
the limiting minimax error probability of every supervised learning algorithm
is no better than a random coin toss. In contrast to related studies where the
classification difficulty (Bayes error) is made to vanish, we hold it constant
when taking high-dimensional limits. In contrast to VC-dimension based minimax
lower bounds that consider the worst case error probability over all
distributions that have a fixed Bayes error, our worst case is over the family
of Gaussian distributions with constant Bayes error. We also show that a
nontrivial asymptotic minimax error probability can only be attained for
parametric subsets of zero measure (in a suitable measure space). These results
expose the fundamental importance of prior knowledge and suggest that unless we
impose strong structural constraints, such as sparsity, on the parametric
space, supervised learning may be ineffective in high dimensional small sample
settings.Comment: This paper was submitted to the IEEE Information Theory Workshop
(ITW) 2013 on April 23, 201
Ensemble Methods in Environmental Data Mining
Environmental data mining is the nontrivial process of identifying valid, novel, and potentially useful patterns in data from environmental sciences. This chapter proposes ensemble methods in environmental data mining that combines the outputs from multiple classification models to obtain better results than the outputs that could be obtained by an individual model. The study presented in this chapter focuses on several ensemble strategies in addition to the standard single classifiers such as decision tree, naive Bayes, support vector machine, and k-nearest neighbor (KNN), popularly used in literature. This is the first study that compares four ensemble strategies for environmental data mining: (i) bagging, (ii) bagging combined with random feature subset selection (the random forest algorithm), (iii) boosting (the AdaBoost algorithm), and (iv) voting of different algorithms. In the experimental studies, ensemble methods are tested on different real-world environmental datasets in various subjects such as air, ecology, rainfall, and soil
FARKLI BAĞLANTI YÖNTEMLERİ İLE HİYERARŞİK KÜMELEME TOPLULUĞU
Kümeleme topluluğu, yüksek kümeleme performansı sağlaması nedeniyle son yıllarda tercih edilen bir teknik haline gelmiştir. Bu çalışmada, Bağlantı-tabanlı Hiyerarşik Kümeleme Topluluğu (BHKT) olarak isimlendirilen yeni bir yaklaşım önerilmektedir. Önerilen yaklaşımda, topluluk elemanları farklı bağlantı yöntemleri kullanarak hiyerarşik kümeleme yapmakta ve sonrasında çoğunluk oylaması ile ortak karar üretmektedir. Çalışmada kullanılan bağlantı yöntemleri: tek bağlantı, tam bağlantı, ortalama bağlantı, merkez bağlantı, Ward yöntemi, komşu birleştirme yöntemi ve ayarlı tam bağlantıdır. Ayrıca çalışmada, farklı boyutlardaki hiyerarşik kümeleme toplulukları incelenmiş ve birbiriyle karşılaştırılmıştır. Deneysel çalışmalarda, hiyerarşik kümeleme toplulukları 8 farklı veri setinde uygulanmış ve tek bir kümeleme algoritmasına göre daha iyi sonuçlar elde edilmiştir
Data Mining in Banking Sector Using Weighted Decision Jungle Method
Classification, as one of the most popular data mining techniques, has been used in the banking sector for different purposes, for example, for bank customer churn prediction, credit approval, fraud detection, bank failure estimation, and bank telemarketing prediction. However, traditional classification algorithms do not take into account the class distribution, which results into undesirable performance on imbalanced banking data. To solve this problem, this paper proposes an approach which improves the decision jungle (DJ) method with a class-based weighting mechanism. The experiments conducted on 17 real-world bank datasets show that the proposed approach outperforms the decision jungle method when handling imbalanced banking data
- …