228 research outputs found

    Global geometric graph kernels and applications

    Get PDF
    This thesis explores the topics of graph kernels and classification of graphs. Graph kernels have received considerable attention in the last decade, in part because of their value in many practical applications, such as chemo informatics and molecular biology, in which classification using graph kernels have become the standard model for several problems. Perhaps even more important is the inclusion of graph kernels in the rich field of kernel methods, making a large family of machine learning algorithms, including support vector machines, applicable to data naturally represented as graphs. Graph kernels are similarity functions defined on pairs of graphs. Traditionally, graph kernels compare graphs in terms of features of subgraphs such as walks, paths or tree patterns. For the kernels to remain computationally efficient, these subgraphs are often chosen to be small. Because of this fact, most graph kernels adopt an inherently local perspective on the graph and may fail to discern global properties, such as the girth or the chromatic number, that are not captured in local structure. Furthermore, existing work on graph kernels lack results justifying a particular choice of kernel for a given application. In this thesis we propose two new graph kernels, designed to capture global properties of graphs, as described above. At the core of these kernels is Lov ́asz number, an important concept in graph theory with strong connections to graph properties like the chromatic number and the size of the largest clique. We give efficient sampling approximations to both kernels, allowing them to scale to large graphs. We also show that we can characterize the separation margin induced by these kernels in certain classification tasks. This serves as initial progress towards making theory aid kernel choice. We make an extensive empirical evaluation of both kernels on synthetic data with known global properties, and on real graphs frequently used to benchmark graph kernels. Finally, we present a new application of graph kernels in the field of data mining by redefining an important subproblem of entity disambiguation as a graph classification problem. We show empirically that our proposed method improves on the state-of-the-art

    Global geometric graph kernels and applications

    Get PDF
    This thesis explores the topics of graph kernels and classification of graphs. Graph kernels have received considerable attention in the last decade, in part because of their value in many practical applications, such as chemo informatics and molecular biology, in which classification using graph kernels have become the standard model for several problems. Perhaps even more important is the inclusion of graph kernels in the rich field of kernel methods, making a large family of machine learning algorithms, including support vector machines, applicable to data naturally represented as graphs. Graph kernels are similarity functions defined on pairs of graphs. Traditionally, graph kernels compare graphs in terms of features of subgraphs such as walks, paths or tree patterns. For the kernels to remain computationally efficient, these subgraphs are often chosen to be small. Because of this fact, most graph kernels adopt an inherently local perspective on the graph and may fail to discern global properties, such as the girth or the chromatic number, that are not captured in local structure. Furthermore, existing work on graph kernels lack results justifying a particular choice of kernel for a given application. In this thesis we propose two new graph kernels, designed to capture global properties of graphs, as described above. At the core of these kernels is Lov ́asz number, an important concept in graph theory with strong connections to graph properties like the chromatic number and the size of the largest clique. We give efficient sampling approximations to both kernels, allowing them to scale to large graphs. We also show that we can characterize the separation margin induced by these kernels in certain classification tasks. This serves as initial progress towards making theory aid kernel choice. We make an extensive empirical evaluation of both kernels on synthetic data with known global properties, and on real graphs frequently used to benchmark graph kernels. Finally, we present a new application of graph kernels in the field of data mining by redefining an important subproblem of entity disambiguation as a graph classification problem. We show empirically that our proposed method improves on the state-of-the-art

    Knowledge extraction from unstructured data

    Get PDF
    Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    Exploring Ancient Networks

    Get PDF
    Um pequeno arquivo de textos provenientes do Iraque antigo é utilizado para demonstrar uma abordagem de análise em rede, em que a leitura atenta tradicional e a análise de texto informatizada andam de mãos dadas. Os métodos computacionais produzem tabelas e gráficos que remetem para as edições online das fontes primárias, permitindo ao utilizador verificar os resultados.Une petite archive de textes provenant de l'Irak antique est utilisée pour démontrer une démarche d'analyse de réseaux dans laquelle la lecture attentive traditionnelle et l'analyse de texte informatique vont de pair. Les méthodes de calcul produisent des tableaux et des graphiques qui renvoient aux éditions en ligne des sources primaires, permettant à l'utilisateur de vérifier les résultats.A small archive of texts from ancient Iraq is used to demonstrate an approach to network analysis in which traditional close reading and computational text analysis go hand-in-hand. The computational methods produce tables and graphs that link back to online editions of the primary material, enabling the user to check the results

    Learning with Geometric Embeddings of Graphs

    Get PDF
    Graphs are natural representations of problems and data in many fields. For example, in computational biology, interaction networks model the functional relationships between genes in living organisms; in the social sciences, graphs are used to represent friendships and business relations among people; in chemoinformatics, graphs represent atoms and molecular bonds. Fields like these are often rich in data, to the extent that manual analysis is not feasible and machine learning algorithms are necessary to exploit the wealth of available information. Unfortunately, in machine learning research, there is a huge bias in favor of algorithms operating only on continuous vector valued data, algorithms that are not suitable for the combinatorial structure of graphs. In this thesis, we show how to leverage both the expressive power of graphs and the strength of established machine learning tools by introducing methods that combine geometric embeddings of graphs with standard learning algorithms. We demonstrate the generality of this idea by developing embedding algorithms for both simple and weighted graphs and applying them in both supervised and unsupervised learning problems such as classification and clustering. Our results provide both theoretical support for the usefulness of graph embeddings in machine learning and empirical evidence showing that this framework is often more flexible and better performing than competing machine learning algorithms for graphs
    corecore