307 research outputs found

    Mining and analysis of real-world graphs

    Get PDF
    Networked systems are everywhere - such as the Internet, social networks, biological networks, transportation networks, power grid networks, etc. They can be very large yet enormously complex. They can contain a lot of information, either open and transparent or under the cover and coded. Such real-world systems can be modeled using graphs and be mined and analyzed through the lens of network analysis. Network analysis can be applied in recognition of frequent patterns among the connected components in a large graph, such as social networks, where visual analysis is almost impossible. Frequent patterns illuminate statistically important subgraphs that are usually small enough to analyze visually. Graph mining has different practical applications in fraud detection, outliers detection, chemical molecules, etc., based on the necessity of extracting and understanding the information yielded. Network analysis can also be used to quantitatively evaluate and improve the resilience of infrastructure networks such as the Internet or power grids. Infrastructure networks directly affect the quality of people\u27s lives. However, a disastrous incident in these networks may lead to a cascading breakdown of the whole network and serious economic consequences. In essence, network analysis can help us gain actionable insights and make better data-driven decisions based on the networks. On that note, the objective of this dissertation is to improve upon existing tools for more accurate mining and analysis of real-world networks --Abstract, page iv

    Intelligent Citizenship Identity through Family Pedigree Using Graph-Signature Based Random-Forest Model

    Get PDF
    There has been a global upsurge of interest in the topic of citizenship identity over the past decades, specifically in the world dominated by profound insecurity, inequalities, proliferation of identities, and rise of identity politics,engendered by capitalism. However finding effective solution to these problems has been rendered difficult. To alleviate these problems, this paper presents an analytical Machine learning model that suitably combined the graph signature with random forest techniques. This study presents the design and realization of a novel Intelligent Citizenship Identity through family pedigree using Graph Signature based random forest (GSB-RF) model. The study also showcases the development of a novel graph signature technique referred to as Canonical Code Signature(CCS) method. The CCS method is used at the pre-processing stage of the identification process to build signature for any given tuple. Performance comparisim between the present system and the baseline techniques which includes: the K-Nearest Neighbour and the traditional Random Forest shows that the present system outperformed the baseline method studied. The proposed system shows capability to perform continuous re-identification of Citizens based on their family pedigree with ability to select best sample with low computational complexity, high identification accuracy and speed. Our experimental result shows that the precision rate and identification quality of our system in most cases are equal to or greater than 70%. Therefore, the proposed Citizenship Identification machine is capable of providing usable, consistent, efficient, faster and accurate identification, to the users, security agents, government agents and institutions on-line, real-time and at any-time

    Mining and modeling graphs using patterns and priors

    No full text

    Geometric, Feature-based and Graph-based Approaches for the Structural Analysis of Protein Binding Sites : Novel Methods and Computational Analysis

    Get PDF
    In this thesis, protein binding sites are considered. To enable the extraction of information from the space of protein binding sites, these binding sites must be mapped onto a mathematical space. This can be done by mapping binding sites onto vectors, graphs or point clouds. To finally enable a structure on the mathematical space, a distance measure is required, which is introduced in this thesis. This distance measure eventually can be used to extract information by means of data mining techniques

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Optimizing graph query performance by indexing and caching

    Get PDF
    Subgraph/supergraph queries, though central to graph analytics, are costly as they entail the NP-Complete problem of subgraph isomorphism. To expedite graph query processing, the community has contributed a wealth of approaches that gradually form two categories, i.e., heuristic subgraph isomorphism (SI) methods and algorithms following “filter-then-verify” paradigm (FTV). However, they both bear performance limitations. And a significant drawback of current studies lies in that they throw away the results obtained when executing previous graph queries. To this end, the current work shall present a fresh solution named iGQ, principle of which is to acquire and utilize knowledge from the results of previously executed queries. iGQ encompasses two component subindexes to identify if a new query is a subgraph or supergraph of previously executed queries, such that the stored knowledge will be turned on to accelerate the execution of the new query graph through reducing the subgraph isomorphism tests to be performed. The correctness of iGQ is assured by formal proof. Moreover, iGQ affords the elegance of double use for subgraph and supergraph query processing, bridging the two separate research threads in the community. On the other hand, using cache to accelerate query processing has been prevalent in data management systems. In the realm of graph structured queries, however, little work has been done. Meanwhile, modern big data applications are emerging and demanding the high performance of graph query processing. Therefore, this thesis shall put forth a full-fledged graph caching system coined GraphCache for graph queries. From the ground up, GraphCache is designed as a semantic graph cache that could harness both subgraph and supergraph cache hits, expanding the traditional hits confined by exact match. GraphCache is featured by well-defined subsystems and interfaces, allowing for the flexibility of plugging in any general subgraph/supergraph query solution, be it an FTV algorithm or SI method. Furthermore, GraphCache incorporates the iGQ as the engine of query processing, where previously issued queries are leveraged to expedite graph query processing. With the continuous arrival of queries and the finite memory space, GraphCache requires mechanisms to effectively manage the space, which in turn emerges the problem of cache replacement. But none of the existing replacement policies are developed specifically for graph cache. This work hence proposes a number of graph query aware strategies with different trade-offs and emphasizes a novel hybrid replacement policy with competitive performance. Following the established research in literature, GraphCache handles graph queries against a static dataset, i.e., all graphs in the underlying dataset keep untouched during the continual arrival and execution of queries. However, in real-world applications, the graph dataset naturally evolves/changes over time. This poses a significant challenge for the current graph caching technique and hence gives rise to the requirement of advanced systems that are capable of accelerating subgraph/supergraph queries against dynamic datasets. To address the problem, this work shall contribute an upgraded graph caching system, namely GraphCache+, stressing the newly plugged in subsystems and components of dealing with the consistency of graph cache. GraphCache+ is characterized by its two cache models that represent different designs of ensuring graph cache consistency, as well as the novel logics of alleviating subgraph and supergraph query processing with formal proof of correctness. Additionally, this work is bundled with comprehensive performance evaluations of GraphCache/GraphCache+ with over 6 million queries against both real-world and synthetic datasets with different characteristics, revealing a number of non-trivial lessons. In overall, this work contributes to the community from three perspectives: it provides a fresh idea to expedite graph query processing, applicable for both SI methods and FTV algorithms; it presents GraphCache, to the best of our knowledge the first full-fledged graph caching system for general subgraph/supergraph queries; it explores the topic of graph cache consistency, putting forth a systematic solution GraphCache+

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore