2,913 research outputs found

    Outlier Edge Detection Using Random Graph Generation Models and Applications

    Get PDF
    Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

    Searching for network modules

    Full text link
    When analyzing complex networks a key target is to uncover their modular structure, which means searching for a family of modules, namely node subsets spanning each a subnetwork more densely connected than the average. This work proposes a novel type of objective function for graph clustering, in the form of a multilinear polynomial whose coefficients are determined by network topology. It may be thought of as a potential function, to be maximized, taking its values on fuzzy clusterings or families of fuzzy subsets of nodes over which every node distributes a unit membership. When suitably parametrized, this potential is shown to attain its maximum when every node concentrates its all unit membership on some module. The output thus is a partition, while the original discrete optimization problem is turned into a continuous version allowing to conceive alternative search strategies. The instance of the problem being a pseudo-Boolean function assigning real-valued cluster scores to node subsets, modularity maximization is employed to exemplify a so-called quadratic form, in that the scores of singletons and pairs also fully determine the scores of larger clusters, while the resulting multilinear polynomial potential function has degree 2. After considering further quadratic instances, different from modularity and obtained by interpreting network topology in alternative manners, a greedy local-search strategy for the continuous framework is analytically compared with an existing greedy agglomerative procedure for the discrete case. Overlapping is finally discussed in terms of multiple runs, i.e. several local searches with different initializations.Comment: 10 page

    Consensus clustering in complex networks

    Get PDF
    The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics.Comment: 11 pages, 12 figures. Published in Scientific Report

    Enabling near-atomic-scale analysis of frozen water

    Get PDF
    Transmission electron microscopy has undergone a revolution in recent years with the possibility to perform routine cryo-imaging of biological materials and (bio)chemical systems, as well as the possibility to image liquids via dedicated reaction cells or graphene-sandwiching. These approaches however typically require imaging a large number of specimens and reconstructing an average representation and often lack analytical capabilities. Here, using atom probe tomography we provide atom-by-atom analyses of frozen liquids and analytical sub-nanometre three dimensional reconstructions. The analyzed ice is in contact with, and embedded within, nanoporous gold (NPG). We report the first such data on 2-3 microns thick layers of ice formed from both high purity deuterated water and a solution of 50mM NaCl in high purity deuterated water. We present a specimen preparation strategy that uses a NPG film and, additionally, we report on an analysis of the interface between nanoporous gold and frozen salt water solution with an apparent trend in the Na and Cl concentrations across the interface. We explore a range of experimental parameters to show that the atom probe analyses of bulk aqueous specimens come with their own special challenges and discuss physical processes that may produce the observed phenomena. Our study demonstrates the viability of using frozen water as a carrier for near-atomic scale analysis of objects in solution by atom probe tomography

    Communities as Well Separated Subgraphs With Cohesive Cores: Identification of Core-Periphery Structures in Link Communities

    Full text link
    Communities in networks are commonly considered as highly cohesive subgraphs which are well separated from the rest of the network. However, cohesion and separation often cannot be maximized at the same time, which is why a compromise is sought by some methods. When a compromise is not suitable for the problem to be solved it might be advantageous to separate the two criteria. In this paper, we explore such an approach by defining communities as well separated subgraphs which can have one or more cohesive cores surrounded by peripheries. We apply this idea to link communities and present an algorithm for constructing hierarchical core-periphery structures in link communities and first test results.Comment: 12 pages, 2 figures, submitted version of a paper accepted for the 7th International Conference on Complex Networks and Their Applications, December 11-13, 2018, Cambridge, UK; revised version at http://141.20.126.227/~qm/papers

    Do logarithmic proximity measures outperform plain ones in graph clustering?

    Full text link
    We consider a number of graph kernels and proximity measures including commute time kernel, regularized Laplacian kernel, heat kernel, exponential diffusion kernel (also called "communicability"), etc., and the corresponding distances as applied to clustering nodes in random graphs and several well-known datasets. The model of generating random graphs involves edge probabilities for the pairs of nodes that belong to the same class or different predefined classes of nodes. It turns out that in most cases, logarithmic measures (i.e., measures resulting after taking logarithm of the proximities) perform better while distinguishing underlying classes than the "plain" measures. A comparison in terms of reject curves of inter-class and intra-class distances confirms this conclusion. A similar conclusion can be made for several well-known datasets. A possible origin of this effect is that most kernels have a multiplicative nature, while the nature of distances used in cluster algorithms is an additive one (cf. the triangle inequality). The logarithmic transformation is a tool to transform the first nature to the second one. Moreover, some distances corresponding to the logarithmic measures possess a meaningful cutpoint additivity property. In our experiments, the leader is usually the logarithmic Communicability measure. However, we indicate some more complicated cases in which other measures, typically, Communicability and plain Walk, can be the winners.Comment: 11 pages, 5 tables, 9 figures. Accepted for publication in the Proceedings of 6th International Conference on Network Analysis, May 26-28, 2016, Nizhny Novgorod, Russi

    Network segregation in a model of misinformation and fact checking

    Get PDF
    Misinformation under the form of rumor, hoaxes, and conspiracy theories spreads on social media at alarming rates. One hypothesis is that, since social media are shaped by homophily, belief in misinformation may be more likely to thrive on those social circles that are segregated from the rest of the network. One possible antidote is fact checking which, in some cases, is known to stop rumors from spreading further. However, fact checking may also backfire and reinforce the belief in a hoax. Here we take into account the combination of network segregation, finite memory and attention, and fact-checking efforts. We consider a compartmental model of two interacting epidemic processes over a network that is segregated between gullible and skeptic users. Extensive simulation and mean-field analysis show that a more segregated network facilitates the spread of a hoax only at low forgetting rates, but has no effect when agents forget at faster rates. This finding may inform the development of mitigation techniques and overall inform on the risks of uncontrolled misinformation online

    A 19-SNP coronary heart disease gene score profile in subjects with type 2 diabetes: the coronary heart disease risk in type 2 diabetes (CoRDia study) study baseline characteristics

    Get PDF
    Background: The coronary risk in diabetes (CoRDia) trial (n = 211) compares the effectiveness of usual diabetes care with a self-management intervention (SMI), with and without personalised risk information (including genetics), on clinical and behavioural outcomes. Here we present an assessment of randomisation, the cardiac risk genotyping assay, and the genetic characteristics of the recruits. / Methods: Ten-year coronary heart disease (CHD) risk was calculated using the UKPDS score. Genetic CHD risk was determined by genotyping 19 single nucleotide polymorphisms (SNPs) using Randox’s Cardiac Risk Prediction Array and calculating a gene score (GS). Accuracy of the array was assessed by genotyping a subset of pre-genotyped samples (n = 185). / Results: Overall, 10-year CHD risk ranged from 2–72 % but did not differ between the randomisation groups (p = 0.13). The array results were 99.8 % concordant with the pre-determined genotypes. The GS did not differ between the Caucasian participants in the CoRDia SMI plus risk group (n = 66) (p = 0.80) and a sample of UK healthy men (n = 1360). The GS was also associated with LDL-cholesterol (p = 0.05) and family history (p = 0.03) in a sample of UK healthy men (n = 1360). / Conclusions: CHD risk is high in this group of T2D subjects. The risk array is an accurate genotyping assay, and is suitable for estimating an individual’s genetic CHD risk. / Trial registration: This study has been registered at ClinicalTrials.gov; registration identifier NCT0189178

    Emergence of scale-free close-knit friendship structure in online social networks

    Get PDF
    Despite the structural properties of online social networks have attracted much attention, the properties of the close-knit friendship structures remain an important question. Here, we mainly focus on how these mesoscale structures are affected by the local and global structural properties. Analyzing the data of four large-scale online social networks reveals several common structural properties. It is found that not only the local structures given by the indegree, outdegree, and reciprocal degree distributions follow a similar scaling behavior, the mesoscale structures represented by the distributions of close-knit friendship structures also exhibit a similar scaling law. The degree correlation is very weak over a wide range of the degrees. We propose a simple directed network model that captures the observed properties. The model incorporates two mechanisms: reciprocation and preferential attachment. Through rate equation analysis of our model, the local-scale and mesoscale structural properties are derived. In the local-scale, the same scaling behavior of indegree and outdegree distributions stems from indegree and outdegree of nodes both growing as the same function of the introduction time, and the reciprocal degree distribution also shows the same power-law due to the linear relationship between the reciprocal degree and in/outdegree of nodes. In the mesoscale, the distributions of four closed triples representing close-knit friendship structures are found to exhibit identical power-laws, a behavior attributed to the negligible degree correlations. Intriguingly, all the power-law exponents of the distributions in the local-scale and mesoscale depend only on one global parameter -- the mean in/outdegree, while both the mean in/outdegree and the reciprocity together determine the ratio of the reciprocal degree of a node to its in/outdegree.Comment: 48 pages, 34 figure

    Development of an exercise intervention for the prevention of musculoskeletal shoulder problems after breast cancer treatment : the prevention of shoulder problems trial (UK PROSPER)

    Get PDF
    Background Musculoskeletal shoulder problems are common after breast cancer treatment. There is some evidence to suggest that early postoperative exercise is safe and may improve shoulder function. We describe the development and delivery of a complex intervention for evaluation within a randomised controlled trial (RCT), designed to target prevention of musculoskeletal shoulder problems after breast cancer surgery (The Prevention of Shoulder Problems Trial; PROSPER). Methods A pragmatic, multicentre RCT to compare the clinical and cost-effectiveness of best practice usual care versus a physiotherapy-led exercise and behavioural support intervention in women at high risk of shoulder problems after breast cancer treatment. PROSPER will recruit 350 women from approximately 15 UK centres, with follow-up at 6 and 12 months. The primary outcome is shoulder function at 12 months; secondary outcomes include postoperative pain, health related quality of life, adverse events and healthcare resource use. A multi-phased approach was used to develop the PROSPER intervention which was underpinned by existing evidence and modified for implementation after input from clinical experts and women with breast cancer. The intervention was tested and refined further after qualitative interviews with patients newly diagnosed with breast cancer; a pilot RCT was then conducted at three UK clinical centres. Discussion The PROSPER intervention incorporates three main components: shoulder-specific exercises targeting range of movement and strength; general physical activity; and behavioural strategies to encourage adherence and support exercise behaviour. The final PROSPER intervention is fully manualised with clear, documented pathways for clinical assessment, exercise prescription, use of behavioural strategies, and with guidance for treatment of postoperative complications. This paper adheres to TIDieR and CERT recommendations for the transparent, comprehensive and explicit reporting of complex interventions. Trial registration: International Standard Randomised Controlled Trial Number: ISRCTN 35358984
    • …
    corecore