270 research outputs found
Multi-graph-view subgraph mining for graph classification
© 2015, Springer-Verlag London. In this paper, we formulate a new multi-graph-view learning task, where each object to be classified contains graphs from multiple graph-views. This problem setting is essentially different from traditional single-graph-view graph classification, where graphs are collected from one single-feature view. To solve the problem, we propose a cross graph-view subgraph feature-based learning algorithm that explores an optimal set of subgraphs, across multiple graph-views, as features to represent graphs. Specifically, we derive an evaluation criterion to estimate the discriminative power and redundancy of subgraph features across all views, with a branch-and-bound algorithm being proposed to prune subgraph search space. Because graph-views may complement each other and play different roles in a learning task, we assign each view with a weight value indicating its importance to the learning task and further use an optimization process to find optimal weight values for each graph-view. The iteration between cross graph-view subgraph scoring and graph-view weight updating forms a closed loop to find optimal subgraphs to represent graphs for multi-graph-view learning. Experiments and comparisons on real-world tasks demonstrate the algorithm’s superior performance
Boosting for multi-graph classification
© 2014 IEEE. In this paper, we formulate a novel graph-based learning problem, multi-graph classification (MGC), which aims to learn a classifier from a set of labeled bags each containing a number of graphs inside the bag. A bag is labeled positive, if at least one graph in the bag is positive, and negative otherwise. Such a multi-graph representation can be used for many real-world applications, such as webpage classification, where a webpage can be regarded as a bag with texts and images inside the webpage being represented as graphs. This problem is a generalization of multi-instance learning (MIL) but with vital differences, mainly because instances in MIL share a common feature space whereas no feature is available to represent graphs in a multi-graph bag. To solve the problem, we propose a boosting based multi-graph classification framework (bMGC). Given a set of labeled multi-graph bags, bMGC employs dynamic weight adjustment at both bag- and graph-levels to select one subgraph in each iteration as a weak classifier. In each iteration, bag and graph weights are adjusted such that an incorrectly classified bag will receive a higher weight because its predicted bag label conflicts to the genuine label, whereas an incorrectly classified graph will receive a lower weight value if the graph is in a positive bag (or a higher weight if the graph is in a negative bag). Accordingly, bMGC is able to differentiate graphs in positive and negative bags to derive effective classifiers to form a boosting model for MGC. Experiments and comparisons on real-world multi-graph learning tasks demonstrate the algorithm performance
Data-Mining Techniques for Call-Graph-Based Software-Defect Localisation
Defect localisation is an important problem in software engineering. This dissertation investigates call-graph-mining-based software defect localisation, which supports software developers by providing hints where defects might be located. It extends the state-of-the-art by proposing new graph representations and mining techniques for weighted graphs. This leads to a broader range of detectable defects, to an increased localisation precision and to enhanced scalability
Recommended from our members
Identifying Important Life Events from Twitter Using Semantic and Syntactic Patterns
Identifying global events from social media has been the focus of much research in recent years. However, the identification of personal life events poses new requirements and challenges that have received relatively little research attention. In this paper we explore a new approach for life event identification, where we expand social media posts into both semantic, and syntactic networks of content. Frequent graph patterns are mined from these networks and used as features to enrich life-event classifiers. Results show that our approach significantly outperforms the best performing baseline in accuracy (by 4.48% points) and F-measure (by 4.54% points) when used to identify five major life events identified from the psychology literature: Getting Married, Having Children, Death of a Parent, Starting School, and Falling in Love. In addition, our results show that, while semantic graphs are effective at discriminating the theme of the post (e.g. the topic of marriage), syntactic graphs help identify whether the post describes a personal event (e.g. someone getting married)
- …