Search CORE

462,751 research outputs found

Segregation Indices for Disease Clustering

Author: Ceyhan Elvan
Publication venue
Publication date: 02/10/2013
Field of study

Spatial clustering has important implications in various fields. In particular, disease clustering is of major public concern in epidemiology. In this article, we propose the use of two distance-based segregation indices to test the significance of disease clustering among subjects whose locations are from a homogeneous or an inhomogeneous population. We derive their asymptotic distributions and compare them with other distance-based disease clustering tests in terms of empirical size and power by extensive Monte Carlo simulations. The null pattern we consider is the random labeling (RL) of cases and controls to the given locations. Along this line, we investigate the sensitivity of the size of these tests to the underlying background pattern (e.g., clustered or homogenous) on which the RL is applied, the level of clustering and number of clusters, or differences in relative abundances of the classes. We demonstrate that differences in relative abundance has the highest impact on the empirical sizes of the tests. We also propose various non-RL patterns as alternatives to the RL pattern and assess the empirical power performance of the tests under these alternatives. We illustrate the methods on two real-life examples from epidemiology.Comment: 31 pages, 13 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Frequent-pattern based iterative projected clustering

Author: Mamoulis N
Yiu ML
Publication venue: IEEE, Computer Society.
Publication date: 01/01/2003
Field of study

Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining frequent itemsets and present heuristics that improve its quality. Our techniques are evaluated with synthetic and real data; they are scalable and discover projected clusters accurately. © 2003 IEEE.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Frequent-pattern based iterative projected clustering

Author: Mamoulis N
Yiu ML
Publication venue: IEEE, Computer Society.
Publication date: 01/01/2003
Field of study

HKU Scholars Hub

Modeling Asymmetric Volatility Clusters Using Copulas and High Frequency Data

Author: Cathy Ning
Dinghai Xu
Tony Wirjanto
Publication venue
Publication date
Field of study

Volatility clustering is a well-known stylized feature of financial asset returns. In this paper, we investigate the asymmetric pattern of volatility clustering on both the stock and foreign exchange rate markets. To this end, we employ copula-based semi-parametric univariate time-series models that accommodate the clusters of both large and small volatilities in the analysis. Using daily realized volatilities of the individual company stocks, stock indices and foreign exchange rates constructed from high frequency data, we find that volatility clustering is strongly asymmetric in the sense that clusters of large volatilities tend to be much stronger than those of small volatilities. In addition, the asymmetric pattern of volatility clusters continues to be visible even when the clusters are allowed to be changing over time, and the volatility clusters themselves remain persistent even after forty days.Volatility clustering, Copulas, Realized volatility, High-frequency data.

Floodplain connectivity, disturbance and change: a palaeoentomological investigation of floodplain ecology from south-west England

Author: Brown A.G.
Davis S.R.
Dinnin M.H.
Publication venue: 'Wiley'
Publication date: 01/03/2007
Field of study

1. Floodplain environments are increasingly subject to enhancement and restoration, with the purpose of increasing their biodiversity and returning them to a more 'natural' state. Defining such a state based solely upon neoecological data is problematic and has led several authors to suggest the use of a palaeoecological approach.2. Fossil Coleopteran assemblages recovered from multiple palaeochannel fills in south-west England were used to investigate past floodplain and channel characteristics during the mid- to late-Holocene. Ordination of coleopteran data was performed using Detrended Correspondence Analysis (DCA) and produced clear and discrete clustering. This clustering pattern is related to the nature of the environment in which assemblages were deposited and hence channel configuration and dynamics.3. The DCA clustering pattern is strongly related to measures of ecological evenness, and a strong relationship between these indices and the composition of the water beetle assemblage within samples was revealed. Repeating the ordination with presence–absence data results in a similar pattern of clustering, implying that assemblage composition is crucial in determining cluster placement.4. As assemblage composition is primarily a function of floodplain topography and hence disturbance regime, we attempt to relate these data to the Intermediate Disturbance Hypothesis (IDH). A significant positive correlation was found between ecological diversity (Shannon's H') and Axis 1 of all ordinations in predominantly aquatic assemblages

Southampton (e-Prints Soton)

Clustering based on Random Graph Model embedding Vertex Features

Author: Ambroise Christophe
Volant Stevenn
Zanghi Hugo
Publication venue
Publication date: 12/10/2009
Field of study

Large datasets with interactions between objects are common to numerous scientific fields (i.e. social science, internet, biology...). The interactions naturally define a graph and a common way to explore or summarize such dataset is graph clustering. Most techniques for clustering graph vertices just use the topology of connections ignoring informations in the vertices features. In this paper, we provide a clustering algorithm exploiting both types of data based on a statistical model with latent structure characterizing each vertex both by a vector of features as well as by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method with real datasets based on hyper-textual documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features

arXiv.org e-Print Archive

CiteSeerX

Pattern Classification Based On Multi-Hyperellipsoid Clustering.

Author: Cai Yao
Publication venue: DigitalCommons@UNO
Publication date: 01/07/1995
Field of study

Traditional model-based pattern classification is based on the assumption that the distribution of the training samples of each pattern class can be formulated by a single statistical function. It is difficult to make an accurate classification by the traditional method when the training samples of different classes do not bind to this assumption. The main contribution of this research is the development of a new clustering technique, called Multi-Hyperellipsoid Clustering, that is able to handle any irregular pattern distributions. The new method uses a supervised maximum likelihood estimation to derive a set of distribution functions for the training samples of each class, and then uses an improved Bayesian probability decision model to partition the pattern space. The new classifier achieved a higher rate of correct classification than the traditional method, with respect to some rather complex pattern distributions in a number of test examples

The University of Nebraska, Omaha

Recommended from our members

Clustering Scatter Plots Using Data Depth Measures.

Author: Borneman James
Braun Jonathan
Cui Xinping
Jeske Daniel R
Li Xiaoxiao
Zhang Zhanpan
Publication venue: eScholarship, University of California
Publication date: 01/01/2011
Field of study

Clustering is rapidly becoming a powerful data mining technique, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing methods can only deal with a data matrix of scalars. In this paper, we introduce a hierarchical clustering procedure that can handle a data matrix of scatter plots. To more accurately reflect the nature of data, we introduce a dissimilarity statistic based on "data depth" to measure the discrepancy between two bivariate distributions without oversimplifying the nature of the underlying pattern. We then combine hypothesis testing with hierarchical clustering to simultaneously cluster the rows and columns of the data matrix of scatter plots. We also propose novel painting metrics and construct heat maps to allow visualization of the clusters. We demonstrate the utility and power of our new clustering method through simulation studies and application to a microbe-host-interaction study

eScholarship - University of California