6 research outputs found

    Frequent Subgraph Mining via Sampling with Rigorous Guarantees

    Get PDF
    Frequent subgraph mining is a fundamental task in the analysis of collections of graphs that aims at finding all the subgraphs that appear with more than a user-specified frequency in the dataset. While several exact approaches have been proposed to solve the task, it remains computationally challenging on large graph datasets due to the complexity of the subgraph isomorphism problem inherent in the task and the huge number of candidate patterns even for fairly small subgraphs. In this thesis, we study two statistical learning measures of complexity, VC-dimension and Rademacher averages, for subgraphs, and derive efficiently computable bounds for both. We then show how such bounds can be applied to devise efficient sampling-based approaches for rigorously approximating the solutions of the frequent subgraph mining problem, providing sample sizes which are much tighter than what would be obtained by a straightforward application of Chernoff and union bounds. We also show that our bounds can be used for true frequent subgraph mining, which requires to identify subgraphs generated with probability above a given threshold using samples from an unknown generative process. Moreover, we carried out an extensive experimental evaluation of our methods on real datasets, which shows that our bounds lead to efficiently computable and high-quality approximations for both applications.Frequent subgraph mining is a fundamental task in the analysis of collections of graphs that aims at finding all the subgraphs that appear with more than a user-specified frequency in the dataset. While several exact approaches have been proposed to solve the task, it remains computationally challenging on large graph datasets due to the complexity of the subgraph isomorphism problem inherent in the task and the huge number of candidate patterns even for fairly small subgraphs. In this thesis, we study two statistical learning measures of complexity, VC-dimension and Rademacher averages, for subgraphs, and derive efficiently computable bounds for both. We then show how such bounds can be applied to devise efficient sampling-based approaches for rigorously approximating the solutions of the frequent subgraph mining problem, providing sample sizes which are much tighter than what would be obtained by a straightforward application of Chernoff and union bounds. We also show that our bounds can be used for true frequent subgraph mining, which requires to identify subgraphs generated with probability above a given threshold using samples from an unknown generative process. Moreover, we carried out an extensive experimental evaluation of our methods on real datasets, which shows that our bounds lead to efficiently computable and high-quality approximations for both applications

    Socio-Cognitive and Affective Computing

    Get PDF
    Social cognition focuses on how people process, store, and apply information about other people and social situations. It focuses on the role that cognitive processes play in social interactions. On the other hand, the term cognitive computing is generally used to refer to new hardware and/or software that mimics the functioning of the human brain and helps to improve human decision-making. In this sense, it is a type of computing with the goal of discovering more accurate models of how the human brain/mind senses, reasons, and responds to stimuli. Socio-Cognitive Computing should be understood as a set of theoretical interdisciplinary frameworks, methodologies, methods and hardware/software tools to model how the human brain mediates social interactions. In addition, Affective Computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects, a fundamental aspect of socio-cognitive neuroscience. It is an interdisciplinary field spanning computer science, electrical engineering, psychology, and cognitive science. Physiological Computing is a category of technology in which electrophysiological data recorded directly from human activity are used to interface with a computing device. This technology becomes even more relevant when computing can be integrated pervasively in everyday life environments. Thus, Socio-Cognitive and Affective Computing systems should be able to adapt their behavior according to the Physiological Computing paradigm. This book integrates proposals from researchers who use signals from the brain and/or body to infer people's intentions and psychological state in smart computing systems. The design of this kind of systems combines knowledge and methods of ubiquitous and pervasive computing, as well as physiological data measurement and processing, with those of socio-cognitive and affective computing

    A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

    No full text
    Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold

    A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

    No full text
    Frequent subgraph mining (FSM) plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining), a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining) algorithm by an order of magnitude for all datasets and can work with a lower support threshold