422 research outputs found

    Fell bundles associated to groupoid morphisms

    Full text link
    Given a continuous open surjective morphism π:GH\pi :G\to H of \'etale groupoids with amenable kernel, we construct a Fell bundle EE over HH and prove that its C*-algebra Cr(E)C^*_r(E) is isomorphic to Cr(G)C^*_r(G). This is related to results of Fell concerning C*-algebraic bundles over groups. The case H=XH=X, a locally compact space, was treated earlier by Ramazan. We conclude that Cr(G)C^*_r(G) is strongly Morita equivalent to a crossed product, the C*-algebra of a Fell bundle arising from an action of the groupoid HH on a C*-bundle over H0H^0. We apply the theory to groupoid morphisms obtained from extensions of dynamical systems and from morphisms of directed graphs with the path lifting property. We also prove a structure theorem for abelian Fell bundles.Comment: 12 pages, revised version, references added; to appear in Mathematica Scandinavic

    Data Mining Using RFM Analysis

    Get PDF

    Service-Oriented Data Mining

    Get PDF

    An Impossibility Result for High Dimensional Supervised Learning

    Full text link
    We study high-dimensional asymptotic performance limits of binary supervised classification problems where the class conditional densities are Gaussian with unknown means and covariances and the number of signal dimensions scales faster than the number of labeled training samples. We show that the Bayes error, namely the minimum attainable error probability with complete distributional knowledge and equally likely classes, can be arbitrarily close to zero and yet the limiting minimax error probability of every supervised learning algorithm is no better than a random coin toss. In contrast to related studies where the classification difficulty (Bayes error) is made to vanish, we hold it constant when taking high-dimensional limits. In contrast to VC-dimension based minimax lower bounds that consider the worst case error probability over all distributions that have a fixed Bayes error, our worst case is over the family of Gaussian distributions with constant Bayes error. We also show that a nontrivial asymptotic minimax error probability can only be attained for parametric subsets of zero measure (in a suitable measure space). These results expose the fundamental importance of prior knowledge and suggest that unless we impose strong structural constraints, such as sparsity, on the parametric space, supervised learning may be ineffective in high dimensional small sample settings.Comment: This paper was submitted to the IEEE Information Theory Workshop (ITW) 2013 on April 23, 201

    Cluster analysis for physical oceanographic data and oceanographic surveys in Turkish seas

    Get PDF
    Cluster analysis is a useful data mining method to obtain detailed information on the physical state of the ocean. The primary objective of this study is the development of a new spatio-temporal density-based algorithm for clustering physical oceanographic data. This study extends the regular spatial cluster analysis to deal with spatial data at different epochs. It also presents the sensitivity of the new algorithm to different parameter settings. The purpose of the sensitivity analysis presented in this paper is to identify the response of the algorithm to variations in input parameter values and boundary conditions. In order to demonstrate the usage of the new algorithm, this paper presents two oceanographic applications that cluster the sea-surface temperature (SST) and the sea-surface height residual (SSH) data which records the satellite observations of the Turkish Seas. It also evaluates and justifies the clustering results by using a cluster validation technique

    Ensemble Methods in Environmental Data Mining

    Get PDF
    Environmental data mining is the nontrivial process of identifying valid, novel, and potentially useful patterns in data from environmental sciences. This chapter proposes ensemble methods in environmental data mining that combines the outputs from multiple classification models to obtain better results than the outputs that could be obtained by an individual model. The study presented in this chapter focuses on several ensemble strategies in addition to the standard single classifiers such as decision tree, naive Bayes, support vector machine, and k-nearest neighbor (KNN), popularly used in literature. This is the first study that compares four ensemble strategies for environmental data mining: (i) bagging, (ii) bagging combined with random feature subset selection (the random forest algorithm), (iii) boosting (the AdaBoost algorithm), and (iv) voting of different algorithms. In the experimental studies, ensemble methods are tested on different real-world environmental datasets in various subjects such as air, ecology, rainfall, and soil

    FARKLI BAĞLANTI YÖNTEMLERİ İLE HİYERARŞİK KÜMELEME TOPLULUĞU

    Get PDF
    Kümeleme topluluğu, yüksek kümeleme performansı sağlaması nedeniyle son yıllarda tercih edilen bir teknik haline gelmiştir. Bu çalışmada, Bağlantı-tabanlı Hiyerarşik Kümeleme Topluluğu (BHKT) olarak isimlendirilen yeni bir yaklaşım önerilmektedir. Önerilen yaklaşımda, topluluk elemanları farklı bağlantı yöntemleri kullanarak hiyerarşik kümeleme yapmakta ve sonrasında çoğunluk oylaması ile ortak karar üretmektedir. Çalışmada kullanılan bağlantı yöntemleri: tek bağlantı, tam bağlantı, ortalama bağlantı, merkez bağlantı, Ward yöntemi, komşu birleştirme yöntemi ve ayarlı tam bağlantıdır. Ayrıca çalışmada, farklı boyutlardaki hiyerarşik kümeleme toplulukları incelenmiş ve birbiriyle karşılaştırılmıştır. Deneysel çalışmalarda, hiyerarşik kümeleme toplulukları 8 farklı veri setinde uygulanmış ve tek bir kümeleme algoritmasına göre daha iyi sonuçlar elde edilmiştir

    Data Mining in Banking Sector Using Weighted Decision Jungle Method

    Get PDF
    Classification, as one of the most popular data mining techniques, has been used in the banking sector for different purposes, for example, for bank customer churn prediction, credit approval, fraud detection, bank failure estimation, and bank telemarketing prediction. However, traditional classification algorithms do not take into account the class distribution, which results into undesirable performance on imbalanced banking data. To solve this problem, this paper proposes an approach which improves the decision jungle (DJ) method with a class-based weighting mechanism. The experiments conducted on 17 real-world bank datasets show that the proposed approach outperforms the decision jungle method when handling imbalanced banking data
    corecore