462,751 research outputs found

    Segregation Indices for Disease Clustering

    Full text link
    Spatial clustering has important implications in various fields. In particular, disease clustering is of major public concern in epidemiology. In this article, we propose the use of two distance-based segregation indices to test the significance of disease clustering among subjects whose locations are from a homogeneous or an inhomogeneous population. We derive their asymptotic distributions and compare them with other distance-based disease clustering tests in terms of empirical size and power by extensive Monte Carlo simulations. The null pattern we consider is the random labeling (RL) of cases and controls to the given locations. Along this line, we investigate the sensitivity of the size of these tests to the underlying background pattern (e.g., clustered or homogenous) on which the RL is applied, the level of clustering and number of clusters, or differences in relative abundances of the classes. We demonstrate that differences in relative abundance has the highest impact on the empirical sizes of the tests. We also propose various non-RL patterns as alternatives to the RL pattern and assess the empirical power performance of the tests under these alternatives. We illustrate the methods on two real-life examples from epidemiology.Comment: 31 pages, 13 figures, 3 table

    Frequent-pattern based iterative projected clustering

    Get PDF
    Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining frequent itemsets and present heuristics that improve its quality. Our techniques are evaluated with synthetic and real data; they are scalable and discover projected clusters accurately. © 2003 IEEE.published_or_final_versio

    Frequent-pattern based iterative projected clustering

    Get PDF
    Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining frequent itemsets and present heuristics that improve its quality. Our techniques are evaluated with synthetic and real data; they are scalable and discover projected clusters accurately. © 2003 IEEE.published_or_final_versio

    Modeling Asymmetric Volatility Clusters Using Copulas and High Frequency Data

    Get PDF
    Volatility clustering is a well-known stylized feature of financial asset returns. In this paper, we investigate the asymmetric pattern of volatility clustering on both the stock and foreign exchange rate markets. To this end, we employ copula-based semi-parametric univariate time-series models that accommodate the clusters of both large and small volatilities in the analysis. Using daily realized volatilities of the individual company stocks, stock indices and foreign exchange rates constructed from high frequency data, we find that volatility clustering is strongly asymmetric in the sense that clusters of large volatilities tend to be much stronger than those of small volatilities. In addition, the asymmetric pattern of volatility clusters continues to be visible even when the clusters are allowed to be changing over time, and the volatility clusters themselves remain persistent even after forty days.Volatility clustering, Copulas, Realized volatility, High-frequency data.

    Floodplain connectivity, disturbance and change: a palaeoentomological investigation of floodplain ecology from south-west England

    No full text
    1. Floodplain environments are increasingly subject to enhancement and restoration, with the purpose of increasing their biodiversity and returning them to a more 'natural' state. Defining such a state based solely upon neoecological data is problematic and has led several authors to suggest the use of a palaeoecological approach.2. Fossil Coleopteran assemblages recovered from multiple palaeochannel fills in south-west England were used to investigate past floodplain and channel characteristics during the mid- to late-Holocene. Ordination of coleopteran data was performed using Detrended Correspondence Analysis (DCA) and produced clear and discrete clustering. This clustering pattern is related to the nature of the environment in which assemblages were deposited and hence channel configuration and dynamics.3. The DCA clustering pattern is strongly related to measures of ecological evenness, and a strong relationship between these indices and the composition of the water beetle assemblage within samples was revealed. Repeating the ordination with presence–absence data results in a similar pattern of clustering, implying that assemblage composition is crucial in determining cluster placement.4. As assemblage composition is primarily a function of floodplain topography and hence disturbance regime, we attempt to relate these data to the Intermediate Disturbance Hypothesis (IDH). A significant positive correlation was found between ecological diversity (Shannon's H') and Axis 1 of all ordinations in predominantly aquatic assemblages

    Clustering based on Random Graph Model embedding Vertex Features

    Full text link
    Large datasets with interactions between objects are common to numerous scientific fields (i.e. social science, internet, biology...). The interactions naturally define a graph and a common way to explore or summarize such dataset is graph clustering. Most techniques for clustering graph vertices just use the topology of connections ignoring informations in the vertices features. In this paper, we provide a clustering algorithm exploiting both types of data based on a statistical model with latent structure characterizing each vertex both by a vector of features as well as by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method with real datasets based on hyper-textual documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features

    Pattern Classification Based On Multi-Hyperellipsoid Clustering.

    Get PDF
    Traditional model-based pattern classification is based on the assumption that the distribution of the training samples of each pattern class can be formulated by a single statistical function. It is difficult to make an accurate classification by the traditional method when the training samples of different classes do not bind to this assumption. The main contribution of this research is the development of a new clustering technique, called Multi-Hyperellipsoid Clustering, that is able to handle any irregular pattern distributions. The new method uses a supervised maximum likelihood estimation to derive a set of distribution functions for the training samples of each class, and then uses an improved Bayesian probability decision model to partition the pattern space. The new classifier achieved a higher rate of correct classification than the traditional method, with respect to some rather complex pattern distributions in a number of test examples
    corecore