54 research outputs found

    Efficient Approximation Algorithms for Spanning Centrality

    Full text link
    Given a graph G\mathcal{G}, the spanning centrality (SC) of an edge ee measures the importance of ee for G\mathcal{G} to be connected. In practice, SC has seen extensive applications in computational biology, electrical networks, and combinatorial optimization. However, it is highly challenging to compute the SC of all edges (AESC) on large graphs. Existing techniques fail to deal with such graphs, as they either suffer from expensive matrix operations or require sampling numerous long random walks. To circumvent these issues, this paper proposes TGT and its enhanced version TGT+, two algorithms for AESC computation that offers rigorous theoretical approximation guarantees. In particular, TGT remedies the deficiencies of previous solutions by conducting deterministic graph traversals with carefully-crafted truncated lengths. TGT+ further advances TGT in terms of both empirical efficiency and asymptotic performance while retaining result quality, based on the combination of TGT with random walks and several additional heuristic optimizations. We experimentally evaluate TGT+ against recent competitors for AESC using a variety of real datasets. The experimental outcomes authenticate that TGT+ outperforms the state of the arts often by over one order of magnitude speedup without degrading the accuracy.Comment: The technical report of the paper entitled 'Efficient Approximation Algorithms for Spanning Centrality' in SIGKDD'2

    Scaling Attributed Network Embedding to Massive Graphs

    Full text link
    Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node vin G to a compact vector Xv, which can be used in downstream machine learning tasks. Ideally, Xv should capture node v's affinity to each attribute, which considers not only v's own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs with millions of nodes pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification. PANE obtains high scalability and effectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. The resulting optimization task is still challenging on large graphs. Second, PANE includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.Comment: 16 pages. PVLDB 2021. Volume 14, Issue

    Efficient and scalable techniques for pagerank-based graph analytics

    No full text
    Graphs are ubiquitous today and are a fundamental data structure to represent objects and their relations in various domains, e.g., social science, citation analysis, weblink analysis, and biological informatics. PageRank-based techniques such as personalized PageRank, heat kernel PageRank, TrustRank have been well studied and shown great eļ¬ƒcacy in graph processing tasks including web ranking, recommender system, graph clustering and combating web spam. In this thesis, we investigate three important problems in graph processing by exploiting PageRank-based techniques, namely, local graph clustering, homogeneous network embedding and attributed network embedding. These three problems are not only interesting in themselves when the graph size becomes large, but also ļ¬nd numerous applications in both academia and industry. First, we study the local graph clustering on undirected graphs. Given an undirected graph G and a seed node s, the local clustering problem aims to identify a high-quality cluster containing s in time roughly proportional to the size of the cluster, regardless of the size of G. Recently, heat kernel PageRank (HKPR) is applied to this problem and found to be more eļ¬ƒcient compared with prior methods. However, existing solutions for computing HKPR either are prohibitively expensive or provide unsatisfactory error approximation on HKPR values, rendering them impractical especially on billion-edge graphs. Thus, we present TEA and TEA+, which utilize deterministic graph traversal to produce a rough estimation of the exact HKPR vector, and then exploit Monte-Carlo random walks to reļ¬ne the results in an optimized and non-trivial way. In particular, TEA+ oļ¬€ers practical eļ¬ƒciency and eļ¬€ectiveness due to non-trivial optimizations. Second, we investigate the homogeneous network embedding (HNE) problem. Given an input graph G and a node v āˆˆ G, HNE aims to map the graph structure in the vicinity of v to a ļ¬xed-dimensional feature vector. Existing approaches to HNE are either immensely expensive, and, thus, are limited to small graphs or fail to fully capture the local graph structure, leading to limited eļ¬€ectiveness of the extracted feature vectors. Meanwhile, in recent years there have been rapid advancements in scalable algorithms for computing approximate personalized PageRank (PPR), which captures rich graph topological information. However, PPR was designed for a very diļ¬€erent purpose, i.e., ranking nodes in G based on their relative importance from a source nodeā€™s perspective. In contrast, HNE aims to build node embeddings considering the whole graph. Consequently, node embeddings derived directly from PPR are of suboptimal utility. Motivated by this, we propose Node-Reweighted PageRank (NRP), a novel solution that transforms PPR values into eļ¬€ective node embeddings, by iteratively solving an optimization problem. Finally, we consider the attributed network embedding (ANE) problem. Given a graph G where each node is associated with a set of attributes, ANE maps each node v in G to a compact vector Xv, which can be used in downstream machine learning tasks. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both. We propose PANE, an eļ¬€ective and scalable approach to ANE computation for massive graphs. PANE obtains high scalability and eļ¬€ectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel node attribute PageRank model for attributed networks. The resulting optimization task is still challenging on large graphs. Second, PANE includes a highly eļ¬ƒcient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings.Doctor of Philosoph

    Establishment of Care System for Hemophilia in China: Current Status and Future Prospect

    No full text
    Hemophilia is a X-linked recessive hereditary bleeding disorders. The patients need to receive replacement treatment with coagulation factors in their whole lives. The medical care of hemophilia depends on the awareness of the medical professionals, patients, and their family members; on the accessibility to the medication for treatment; on the insurance policies, and etc. This article presents the forming process of the medical care for hemophilia in China, including joining the World Federation of Hemophilia(WFH), forming the Hemophilia Treatment Center Collaborative Network of China(HTCCNC), initiating the national hemo-philia registry system, and organizing hemophilia patients associations. In the meantime, the article presents the clinical practice of tiered care system for hemophilia in China, providing reference to the medical professionals and policy makers involving in the care of rare diseases in China

    Establishment and Evolution of China National Hemophilia Registry

    No full text
    Hemophilia is an inherited bleeding disorder and a type of rare disease that is hereditary, lifelong and disabling. The establishment of a National Hemophilia Registry is foundational to treating hemophilia. The initial registry of hemophilia in China was first established using the paper form in 1996 and upgraded to online system in 2007. Following the China's Ministry of Health's decision to establish a national hemophilia case information management system in 2009, China has officially established a National Hemophilia Registry based on previous work. More than 200 hospitals have been involved in this work. The National Hemophilia Registry also provides the basis for the study of hemophilia epidemiology, disease characteristics and related policy formulation

    Growing success and ambitions of the Chinese edition of Haemophilia.

    No full text
    The success of Haemophilia cannot be appreciated without considering the importance of its Chinese edition. Both of us, as the Editorā€inā€Chief of Haemophilia and the editor of the translated Chinese edition, would like to congratulate all the board members of the Chinese edition for their major contributions to the journal and its success. The Chinese edition of Haemophilia represents an important asset. Two issues of the Chinese edition will be published in 2020; these issues are expected to increase the visibility of several papers published in Haemophilia. [...

    IP-10 and MCP-1 gene polymorphisms in Chinese patients with chronic immune thrombocytopenia

    No full text
    Aberrant Th1/Th2 polarization is considered to play a crucial role in the abnormal immune state of primary immune thrombocytopenia (ITP). IFN-Ī³-inducible protein of 10 kilodaltons (IP-10) and Monocyte chemoattractant protein-1 (MCP-1) gene are involved in enhancing the Th1 and Th2 immune response, respectively. In this study we investigated the distributions of IP-10 (-201ā€‰G/A) and MCP-1 (-2518ā€‰A/G) polymorphisms in 323 patients with chronic ITP and 255 healthy controls by polymerase chain reaction (PCR)-restriction fragment length polymorphism (RFLP). The IP-10 and MCP-1 levels of blood serum from 79 adult ITP patients and 43 healthy controls were detected with ELISA. The frequency of AGā€‰+ā€‰AA genotype in IP-10 (-201ā€‰G/A) was significantly higher in ITP patients than in controls, especially in female and adult patients. ITP patients showed higher IP-10 levels than normal controls. Moreover, both IP-10 (-201ā€‰G/A) heterozygote (GA) and homozygote minor allele (AA) patients had significantly increased IP-10 levels compared to homozygote genotype (GG) patients at diagnosis. No significant differences were revealed in genotypes and allele distributions of MCP-1 (-2518ā€‰A/G) between ITP patients and normal controls, as well as the MCP-1 levels. In conclusion, the -201ā€‰G/A polymorphism of IP-10 gene may be associated with the susceptibility of ITP in Chinese population
    • ā€¦
    corecore