18 research outputs found

    Mining Frequent Neighborhood Patterns in Large Labeled Graphs

    Full text link
    Over the years, frequent subgraphs have been an important sort of targeted patterns in the pattern mining literatures, where most works deal with databases holding a number of graph transactions, e.g., chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google Knowledge Graph and Facebook social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets mining patterns in the single-graph setting. We resolve the "DCP-intuitiveness" dilemma by shifting the mining target from frequent subgraphs to frequent neighborhoods. A neighborhood is a specific topological pattern where a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant semantics as subgraph patterns. Experiments on real-life datasets display the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered in prior works.Comment: 9 page

    Word Sense Disambiguation: A Structured Learning Perspective

    Get PDF
    This paper explores the application of structured learning methods (SLMs) to word sense disambiguation (WSD). On one hand, the semantic dependencies between polysemous words in the sentence can be encoded in SLMs. On the other hand, SLMs obtained significant achievements in natural language processing, and so it is a natural idea to apply them to WSD. However, there are many theoretical and practical problems when SLMs are applied to WSD, due to characteristics of WSD. Beginning with the method based on hidden Markov model, this paper proposes for the first time a comprehensive and unified solution for WSD based on maximum entropy Markov model, conditional random field and tree-structured conditional random field, and reduces the time complexity and running time of the proposed methods to a reasonable level by beam search, approximate training, and parallel training. The update of models brings performance improvement, the introduction of one step dependency improves performance by 1--5 percent, the adoption of non-independent features improves performance by 2--3 percent, and the extension of underlying structure to dependency parsing tree improves performance by about 1 percent. On the English all-words WSD dataset of Senseval-2004, the method based on tree-structured conditional random field outperforms the best attendee system significantly. Nevertheless, almost all machine learning methods suffer from data sparseness due to the scarcity of sense tagged data, and so do SLMs. Besides improving structured learning methods according to the characteristics of WSD, another approach to improve disambiguation performance is to mine disambiguation knowledge from all kinds of sources, such as Wikipedia, parallel corpus, and to alleviate knowledge acquisition bottleneck of WSD

    Modelado de perfiles de usuario para la recomendación de contenido en Twitter

    Get PDF
    En este trabajo se investigan diferentes mecanismos para deducir la semántica de los mensajes de Twitter con el fin de modelar perfiles de usuario. Se introducen y analizan métodos de procesamiento de lenguaje natural para plantear diferentes formas de inferir los intereses de los usuarios a partir de sus tweets. Luego, esas estrategias son comparadas para analizar el comportamiento al recomendar mensajes de otros usuarios.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Modelado de perfiles de usuario para la recomendación de contenido en Twitter

    Get PDF
    En este trabajo se investigan diferentes mecanismos para deducir la semántica de los mensajes de Twitter con el fin de modelar perfiles de usuario. Se introducen y analizan métodos de procesamiento de lenguaje natural para plantear diferentes formas de inferir los intereses de los usuarios a partir de sus tweets. Luego, esas estrategias son comparadas para analizar el comportamiento al recomendar mensajes de otros usuarios.Sociedad Argentina de Informática e Investigación Operativa (SADIO
    corecore