18 research outputs found
Mining Frequent Neighborhood Patterns in Large Labeled Graphs
Over the years, frequent subgraphs have been an important sort of targeted
patterns in the pattern mining literatures, where most works deal with
databases holding a number of graph transactions, e.g., chemical structures of
compounds. These methods rely heavily on the downward-closure property (DCP) of
the support measure to ensure an efficient pruning of the candidate patterns.
When switching to the emerging scenario of single-graph databases such as
Google Knowledge Graph and Facebook social graph, the traditional support
measure turns out to be trivial (either 0 or 1). However, to the best of our
knowledge, all attempts to redefine a single-graph support resulted in measures
that either lose DCP, or are no longer semantically intuitive.
This paper targets mining patterns in the single-graph setting. We resolve
the "DCP-intuitiveness" dilemma by shifting the mining target from frequent
subgraphs to frequent neighborhoods. A neighborhood is a specific topological
pattern where a vertex is embedded, and the pattern is frequent if it is shared
by a large portion (above a given threshold) of vertices. We show that the new
patterns not only maintain DCP, but also have equally significant semantics as
subgraph patterns. Experiments on real-life datasets display the feasibility of
our algorithms on relatively large graphs, as well as the capability of mining
interesting knowledge that is not discovered in prior works.Comment: 9 page
Word Sense Disambiguation: A Structured Learning Perspective
This paper explores the application of structured learning methods (SLMs) to word sense disambiguation (WSD). On one hand, the semantic dependencies between polysemous words in the sentence can be encoded in SLMs. On the other hand, SLMs obtained significant achievements in natural language processing, and so it is a natural idea to apply them to WSD. However, there are many theoretical and practical problems when SLMs are applied to WSD, due to characteristics of WSD. Beginning with the method based on hidden Markov model, this paper proposes for the first time a comprehensive and unified solution for WSD based on maximum entropy Markov model, conditional random field and tree-structured conditional random field, and reduces the time complexity and running time of the proposed methods to a reasonable level by beam search, approximate training, and parallel training. The update of models brings performance improvement, the introduction of one step dependency improves performance by 1--5 percent, the adoption of non-independent features improves performance by 2--3 percent, and the extension of underlying structure to dependency parsing tree improves performance by about 1 percent. On the English all-words WSD dataset of Senseval-2004, the method based on tree-structured conditional random field outperforms the best attendee system significantly. Nevertheless, almost all machine learning methods suffer from data sparseness due to the scarcity of sense tagged data, and so do SLMs. Besides improving structured learning methods according to the characteristics of WSD, another approach to improve disambiguation performance is to mine disambiguation knowledge from all kinds of sources, such as Wikipedia, parallel corpus, and to alleviate knowledge acquisition bottleneck of WSD
Modelado de perfiles de usuario para la recomendación de contenido en Twitter
En este trabajo se investigan diferentes mecanismos para deducir la semántica de los mensajes de Twitter con el fin de modelar perfiles de usuario. Se introducen y analizan métodos de procesamiento de lenguaje natural para plantear diferentes formas de inferir los intereses de los usuarios a partir de sus tweets. Luego, esas estrategias son comparadas para analizar el comportamiento al recomendar mensajes de otros usuarios.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Modelado de perfiles de usuario para la recomendación de contenido en Twitter
En este trabajo se investigan diferentes mecanismos para deducir la semántica de los mensajes de Twitter con el fin de modelar perfiles de usuario. Se introducen y analizan métodos de procesamiento de lenguaje natural para plantear diferentes formas de inferir los intereses de los usuarios a partir de sus tweets. Luego, esas estrategias son comparadas para analizar el comportamiento al recomendar mensajes de otros usuarios.Sociedad Argentina de Informática e Investigación Operativa (SADIO