17,586 research outputs found

    Cluster validity in clustering methods

    Get PDF

    Noise resistant generalized parametric validity index of clustering for gene expression data

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements

    Relational visual cluster validity

    Get PDF
    The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente

    Multifractal current distribution in random diode networks

    Full text link
    Recently it has been shown analytically that electric currents in a random diode network are distributed in a multifractal manner [O. Stenull and H. K. Janssen, Europhys. Lett. 55, 691 (2001)]. In the present work we investigate the multifractal properties of a random diode network at the critical point by numerical simulations. We analyze the currents running on a directed percolation cluster and confirm the field-theoretic predictions for the scaling behavior of moments of the current distribution. It is pointed out that a random diode network is a particularly good candidate for a possible experimental realization of directed percolation.Comment: RevTeX, 4 pages, 5 eps figure

    Organizational Pay Mix: The Implications of Various Theoretical Perspectives for the Conceptualization and Measurement of Individual Pay Components

    Get PDF
    While pay mix is one of the most frequently used variables in recent compensation research, its theoretical relevance and measurement remains underdeveloped. There is little agreement among studies on the definitions of the various forms of pay that go into pay mix. Even studies that examine the same theories tend to overlook the implications of differences in the measures and meanings of pay mix used in other studies. Our study explores the meaning of pay mix using several theories commonly used in recent compensation research (agency, efficiency wage, expectancy, equity, and person-organization fit). Recent studies generally use a single measure of mix (e.g., bonus/base, or stock options/total, or benefits/base). We argue that to fully understand the effects of employee compensation, the multiple forms of compensation must be taken into account. Therefore, we derived pay mix measures from the theories commonly used in compensation research. We classified the pay mix policies of 478 firms using cluster-analytic techniques. We found that the classification of organizations based on their pay mix depends on the measures used. We suggest that as more realistic measures of pay mix leads to reinterpretation of compensation research and offers directions for theory development

    Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data

    Get PDF
    Recent years have seen the rise of more sophisticated attacks including advanced persistent threats (APTs) which pose severe risks to organizations and governments by targeting confidential proprietary information. Additionally, new malware strains are appearing at a higher rate than ever before. Since many of these malware are designed to evade existing security products, traditional defenses deployed by most enterprises today, e.g., anti-virus, firewalls, intrusion detection systems, often fail at detecting infections at an early stage. We address the problem of detecting early-stage infection in an enterprise setting by proposing a new framework based on belief propagation inspired from graph theory. Belief propagation can be used either with "seeds" of compromised hosts or malicious domains (provided by the enterprise security operation center -- SOC) or without any seeds. In the latter case we develop a detector of C&C communication particularly tailored to enterprises which can detect a stealthy compromise of only a single host communicating with the C&C server. We demonstrate that our techniques perform well on detecting enterprise infections. We achieve high accuracy with low false detection and false negative rates on two months of anonymized DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of real-world web proxy logs collected at the border of a large enterprise. Through careful manual investigation in collaboration with the enterprise SOC, we show that our techniques identified hundreds of malicious domains overlooked by state-of-the-art security products
    • …
    corecore