Search CORE

23,465 research outputs found

Efficient Iterative Processing in the SciDB Parallel Array Engine

Author: Balazinska Magdalena
Connolly Andrew
Krughoff Simon
Soroush Emad
Publication venue
Publication date: 31/05/2015
Field of study

Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native support for iterative processing. In this paper, we develop a model for iterative array computations and a series of optimizations. We evaluate the benefits of an optimized, native support for iterative array processing on the SciDB engine and real workloads from the astronomy domain

arXiv.org e-Print Archive

CiteSeerX

Crossref

Anytime Hierarchical Clustering

Author: Arslan Omur
Koditschek Daniel E.
Publication venue
Publication date: 13/04/2014
Field of study

We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Data mining: a tool for detecting cyclical disturbances in supply networks.

Author: Chan F. T. S.
Chatfield C.
Davis T.
Devijver P. A.
Fayyad U. M.
Forrester J. W.
Han J.
Harding J. A.
Jolliffe I. T.
Kaufman L.
Klösgen W.
Koopmans L. H.
Mason-Jones R.
Monostori L.
Pyle D.
Witten I. H.
Publication venue: 'SAGE Publications'
Publication date: 21/12/2007
Field of study

Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause

Crossref

Middlesex University Research Repository

Adaptive Evolutionary Clustering

Author: AC Harvey
Alfred O. Hero III
DJ Fenn
GW Milligan
H Lütkepohl
H Ning
HW Kuhn
J Schäfer
J Shi
Kevin S. Xu
M Charikar
Mark Kliger
N Eagle
O Ledoit
PJ Mucha
S Haykin
S Tadepalli
T Hastie
T Yang
TW Anderson
U Luxburg von
Y Chen
Y Chi
YR Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naive estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox available at http://tbayes.eecs.umich.edu/xukevin/affec

arXiv.org e-Print Archive

CiteSeerX

Crossref

Graph Summarization

Author: Bonifati Angela
Dumbrava Stefania
Kondylakis Haridimos
Publication venue
Publication date: 01/04/2020
Field of study

The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot