2,571 research outputs found

    Coupled clustering ensemble by exploring data interdependence

    Get PDF
    © 2018 ACM. Clustering ensembles combine multiple partitions of data into a single clustering solution. It is an effective technique for improving the quality of clustering results. Current clustering ensemble algorithms are usually built on the pairwise agreements between clusterings that focus on the similarity via consensus functions, between data objects that induce similarity measures from partitions and re-cluster objects, and between clusters that collapse groups of clusters into meta-clusters. In most of those models, there is a strong assumption on IIDness (i.e., independent and identical distribution), which states that base clusterings perform independently of one another and all objects are also independent. In the real world, however, objects are generally likely related to each other through features that are either explicit or even implicit. There is also latent but definite relationship among intermediate base clusterings because they are derived from the same set of data. All these demand a further investigation of clustering ensembles that explores the interdependence characteristics of data. To solve this problem, a new coupled clustering ensemble (CCE) framework that works on the interdependence nature of objects and intermediate base clusterings is proposed in this article. The main idea is to model the coupling relationship between objects by aggregating the similarity of base clusterings, and the interactive relationship among objects by addressing their neighborhood domains. Once these interdependence relationships are discovered, they will act as critical supplements to clustering ensembles. We verified our proposed framework by using three types of consensus function: clustering-based, object-based, and cluster-based. Substantial experiments on multiple synthetic and real-life benchmark datasets indicate that CCE can effectively capture the implicit interdependence relationships among base clusterings and among objects with higher clustering accuracy, stability, and robustness compared to 14 state-of-the-art techniques, supported by statistical analysis. In addition, we show that the final clustering quality is dependent on the data characteristics (e.g., quality and consistency) of base clusterings in terms of sensitivity analysis. Finally, the applications in document clustering, as well as on the datasets with much larger size and dimensionality, further demonstrate the effectiveness, efficiency, and scalability of our proposed models

    Dynamical Systems on Networks: A Tutorial

    Full text link
    We give a tutorial for the study of dynamical systems on networks. We focus especially on "simple" situations that are tractable analytically, because they can be very insightful and provide useful springboards for the study of more complicated scenarios. We briefly motivate why examining dynamical systems on networks is interesting and important, and we then give several fascinating examples and discuss some theoretical results. We also briefly discuss dynamical systems on dynamical (i.e., time-dependent) networks, overview software implementations, and give an outlook on the field.Comment: 39 pages, 1 figure, submitted, more examples and discussion than original version, some reorganization and also more pointers to interesting direction

    The structure and dynamics of multilayer networks

    Get PDF
    In the past years, network theory has successfully characterized the interaction among the constituents of a variety of complex systems, ranging from biological to technological, and social systems. However, up until recently, attention was almost exclusively given to networks in which all components were treated on equivalent footing, while neglecting all the extra information about the temporal- or context-related properties of the interactions under study. Only in the last years, taking advantage of the enhanced resolution in real data sets, network scientists have directed their interest to the multiplex character of real-world systems, and explicitly considered the time-varying and multilayer nature of networks. We offer here a comprehensive review on both structural and dynamical organization of graphs made of diverse relationships (layers) between its constituents, and cover several relevant issues, from a full redefinition of the basic structural measures, to understanding how the multilayer nature of the network affects processes and dynamics.Comment: In Press, Accepted Manuscript, Physics Reports 201

    Multilayer Networks

    Full text link
    In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such "multilayer" features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize "traditional" network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary of terminology to relate the numerous existing concepts to each other, and provide a thorough discussion that compares, contrasts, and translates between related notions such as multilayer networks, multiplex networks, interdependent networks, networks of networks, and many others. We also survey and discuss existing data sets that can be represented as multilayer networks. We review attempts to generalize single-layer-network diagnostics to multilayer networks. We also discuss the rapidly expanding research on multilayer-network models and notions like community structure, connected components, tensor decompositions, and various types of dynamical processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure

    Using a scenario-neutral framework to avoid potential maladaptation to future flood risk

    Get PDF
    This study develops a coherent framework to detect those catchment types associated with ahigh risk of maladaptation to futureflood risk. Using the“scenario‐neutral”approach to impactassessment the sensitivity of Irish catchments tofluvialflooding is examined in the context of nationalclimate change allowances. A predefined sensitivity domain is used to quantifyflood responses to +2 °Cmean annual temperature with incremental changes in the seasonality and mean of the annual precipitationcycle. The magnitude of the 20‐yearflood is simulated at each increment using two rainfall‐runoff models(GR4J, NAM), then concatenated as response surfaces for 35 sample catchments. A typology of catchmentsensitivity is developed using clustering and discriminant analysis of physical attributes. The same attributesare used to classify 215 ungauged/data‐sparse catchments. To address possible redundancies, the exposure ofdifferent catchment types to projected climate is established using an objectively selected subset of theCoupled Model Intercomparison Project Phase 5 ensemble. Hydrological model uncertainty is shown tosignificantly influence sensitivity and have a greater effect than ensemble bias. A nationalflood riskallowance of 20%, considering all 215 catchments is shown to afford protection against ~48% to 98% of theuncertainty in the Coupled Model Intercomparison Project Phase 5 subset (Representative ConcentrationPathway 8.5; 2070–2099), irrespective of hydrological model and catchment type. However, results indicatethat assuming a standard national or regional allowance could lead to local over/under adaptation. Herein,catchments with relatively less storage are sensitive to seasonal amplification in the annual cycle ofprecipitation and warrant special attention

    Constraints on the Dark Side of the Universe and Observational Hubble Parameter Data

    Get PDF
    This paper is a review on the observational Hubble parameter data that have gained increasing attention in recent years for their illuminating power on the dark side of the universe --- the dark matter, dark energy, and the dark age. Currently, there are two major methods of independent observational H(z) measurement, which we summarize as the "differential age method" and the "radial BAO size method". Starting with fundamental cosmological notions such as the spacetime coordinates in an expanding universe, we present the basic principles behind the two methods. We further review the two methods in greater detail, including the source of errors. We show how the observational H(z) data presents itself as a useful tool in the study of cosmological models and parameter constraint, and we also discuss several issues associated with their applications. Finally, we point the reader to a future prospect of upcoming observation programs that will lead to some major improvements in the quality of observational H(z) data.Comment: 20 pages, 6 figures, and 1 table, uses REVTeX 4.1. Review article, accepted by Advances in Astronom

    Honeybee-like collective decision making in a kilobot swarm

    Full text link
    Drawing inspiration from honeybee swarms' nest-site selection process, we assess the ability of a kilobot robot swarm to replicate this captivating example of collective decision-making. Honeybees locate the optimal site for their new nest by aggregating information about potential locations and exchanging it through their waggle-dance. The complexity and elegance of solving this problem relies on two key abilities of scout honeybees: self-discovery and imitation, symbolizing independence and interdependence, respectively. We employ a mathematical model to represent this nest-site selection problem and program our kilobots to follow its rules. Our experiments demonstrate that the kilobot swarm can collectively reach consensus decisions in a decentralized manner, akin to honeybees. However, the strength of this consensus depends not only on the interplay between independence and interdependence but also on critical factors such as swarm density and the motion of kilobots. These factors enable the formation of a percolated communication network, through which each robot can receive information beyond its immediate vicinity. By shedding light on this crucial layer of complexity --the crowding and mobility conditions during the decision-making--, we emphasize the significance of factors typically overlooked but essential to living systems and life itself.Comment: 19 pages, 8 figures, 6 appendix figures, 3 supplementary figure

    Homophily Outlier Detection in Non-IID Categorical Data

    Full text link
    Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data by capturing non-IID outlier factors. Our approach first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation. It then models an outlierness propagation process in the value graph to learn the outlierness of feature values. The learned value outlierness allows for either direct outlier detection or outlying feature selection. The graph representation and mining approach is employed here to well capture the rich non-IID characteristics. Our empirical results on 15 real-world data sets with different levels of data complexities show that (i) the proposed outlier detection methods significantly outperform five state-of-the-art methods at the 95%/99% confidence level, achieving 10%-28% AUC improvement on the 10 most complex data sets; and (ii) the proposed feature selection methods significantly outperform three competing methods in enabling subsequent outlier detection of two different existing detectors.Comment: To appear in Data Ming and Knowledge Discovery Journa
    corecore