10 research outputs found

    Unsupervised Heterogeneous Graph Neural Networks for One-Class Tasks: Exploring Early Fusion Operators

    Get PDF
    Heterogeneous graphs are an essential structure that models real-world data through different types of nodes and relationships between them, including multimodality, which comprises different types of data such as text, image, and audio. Graph Neural Networks (GNNs) are a prominent graph representation learning method that takes advantage of the graph structure and its attributes that, when applied to the multimodal heterogeneous graph, learn a unique semantic space for the different modalities. Consequently, it allows multimodal fusion through simple operators such as sum, average, or multiplication, generating unified representations considering the supplementary and complementarity relationships between the modalities. In multimodal heterogeneous graphs, the labeling process tends to be even more costly due to the multiple modalities analyzed, in addition to the imbalance of classes inherent to some applications. In order to overcome these problems in applications that comprise a class of interest, One-Class Learning (OCL) is used. Given the lack of studies on multimodal early fusion in heterogeneous graphs for OCL tasks, we proposed a method based on unsupervised GNN for heterogeneous graphs and evaluated different early fusion operators. In this paper, we extend another work by evaluating the behavior of the main GNN convolutions in the method. We highlight that using operators such as average, addition, and subtraction were the best early fusion operators. In addition, GNN layers that do not use an attention mechanism performed better. In this way, we argue for heterogeneous graph neural networks in multimodal using early fusion simple operators instead of well-often-used concatenation and less complex convolutions

    Um software para recuperar e analisar artigos open access em agricultura utilizando técnicas de mineração de textos.

    Get PDF
    Neste trabalho é apresentado o software CRITIC – Compilação e Recuperação de Informação Técnico-científica e Indução ao Conhecimento, com base em técnicas de mineração de textos sobre artigos científicos. O software usa um provedor de serviços para acesso aos repositórios de referências aos artigos, cujo acesso é aberto. O CRITIC está sendo desenvolvido para ser colocado em repositório Open Source. Na sua versão atual, permite-se realizar uma análise exploratória sobre os resultados das consultas, na qual são automaticamente identificados: tópicos hierárquicos dos temas cobertos na consulta; a distribuição temporal desses temas; e, a distribuição geoespacial dos temas cobertos pelos textos. Discute-se neste, alguns resultados de sua primeira versão, a metodologia de mineração de textos utilizada e a arquitetura do software – projetado para ser facilmente expandido.In this paper we present the CRITIC software, which has been developed to the compilation, recovery and induction to knowledge from technical and scientific articles through text mining techniques. This software has been using a service provider to access the articles references in open access repositories. Additionally, CRITIC is being developed to be an Open Source software. In its current version, it allows to carry out an exploratory analysis of the recovering results, in which: automatically identified topic hierarchies showed some possible topics over the results; the temporal distribution of these topics; and spatial distribution of subjects covered by the texts. Furthermore, some results of the first version are discussed, as well as the text mining methodology and the software architecture - designed to be easily expanded

    Unsupervised learning of topic hierarchies from dynamic text collections

    No full text
    A necessidade de extrair conhecimento útil e inovador de grandes massas de dados textuais, tem motivado cada vez mais a investigação de métodos para Mineração de Textos. Dentre os métodos existentes, destacam-se as iniciativas para organização de conhecimento por meio de hierarquias de tópicos, nas quais o conhecimento implícito nos textos é representado em tópicos e subtópicos, e cada tópico contém documentos relacionados a um mesmo tema. As hierarquias de tópicos desempenham um papel importante na recupera ção de informação, principalmente em tarefas de busca exploratória, pois permitem a análise do conhecimento de interesse em diversos níveis de granularidade e exploração interativa de grandes coleções de documentos. Para apoiar a construção de hierarquias de tópicos, métodos de agrupamento hierárquico têm sido utilizados, uma vez que organizam coleções textuais em grupos e subgrupos, de forma não supervisionada, por meio das similaridades entre os documentos. No entanto, a maioria dos métodos de agrupamento hierárquico não é adequada em cenários que envolvem coleções textuais dinâmicas, pois são exigidas frequentes atualizações dos agrupamentos. Métodos de agrupamento que respeitam os requisitos existentes em cenários dinâmicos devem processar novos documentos assim que são adicionados na coleção, realizando o agrupamento de forma incremental. Assim, neste trabalho é explorado o uso de métodos de agrupamento incremental para o aprendizado não supervisionado de hierarquias de tópicos em coleções textuais dinâmicas. O agrupamento incremental é aplicado na construção e atualização de uma representação condensada dos textos, que mantém um sumário das principais características dos dados. Os algoritmos de agrupamento hierárquico podem, então, ser aplicados sobre as representa ções condensadas, obtendo-se a organização da coleção textual de forma mais eficiente. Foram avaliadas experimentalmente três estratégias de agrupamento incremental da literatura, e proposta uma estratégia alternativa mais apropriada para hierarquias de tópicos. Os resultados indicaram que as hierarquias de tópicos construídas com uso de agrupamento incremental possuem qualidade próxima às hierarquias de tópicos construídas por métodos não incrementais, com significativa redução do custo computacionalThe need to extract new and useful knowledge from large textual collections has motivated researchs on Text Mining methods. Among the existing methods, initiatives for the knowledge organization by topic hierarchies are very popular. In the topic hierarchies, the knowledge is represented by topics and subtopics, and each topic contains documents of similar content. They play an important role in information retrieval, especially in exploratory search tasks, allowing the analysis of knowledge in various levels of granularity and interactive exploration of large document collections. Hierarchical clustering methods have been used to support the construction of topic hierarchies. These methods organize textual collections in clusters and subclusters, in an unsupervised manner, using similarities among documents. However, most existing hierarchical clustering methods is not suitable for scenarios with dynamic text collections, since frequent clustering updates are necessary. Clustering methods that meet these requirements must process new documents that are inserted into textual colections, in general, through incremental clustering. Thus, we studied the incremental clustering methods for unsupervised learning of topic hierarchies for dynamic text collections. The incremental clustering is used to build and update a condensed representation of texts, which maintains a summary of the main features of the data. The hierarchical clustering algorithms are applied in these condensed representations, obtaining the textual organization more efficiently. We experimentally evaluate three incremental clustering algorithms available in the literature. Also, we propose an alternative strategy more appropriate for construction of topic hieararchies. The results indicated that the topic hierarchies construction using incremental clustering have quality similar to non-incremental methods. Furthermore, the computational cost is considerably reduced using incremental clustering method

    Machine learning with privileged information: approaches for hierarchical text clustering

    No full text
    Métodos de agrupamento hierárquico de textos são muito úteis para analisar o conhecimento embutido em coleções textuais, organizando os documentos textuais em grupos e subgrupos para facilitar a exploração do conhecimento em diversos níveis de granularidade. Tais métodos pertencem à área de aprendizado não supervisionado de máquina, uma que vez obtêm modelos de agrupamento apenas pela observação de regularidades existentes na coleção textual, sem supervisão humana. Os métodos tradicionais de agrupamento assumem que a coleção textual é representada apenas pela informação técnica, ou seja, palavras e frases extraídas diretamente dos textos. Por outro lado, em muitas tarefas de agrupamento existe conhecimento adicional e valioso a respeito dos dados, geralmente extraído por um processo avançado com apoio de usuários especialistas do domínio do problema. Devido ao alto custo para obtenção desses dados, esta informação adicional é definida como privilegiada e usualmente está disponível para representar apenas um subconjunto dos documentos textuais. Recentemente, um novo paradigma de aprendizado de máquina denominado LUPI (Learning Using Privileged Information) foi proposto por Vapnik para incorporar informação privilegiada em métodos aprendizado supervisionado. Neste trabalho de doutorado, o paradigma LUPI foi estendido para aprendizado não supervisionado, em especial, para agrupamento hierárquico de textos. Foram propostas e avaliadas abordagens para lidar com diferentes desafios existentes em tarefas de agrupamento, envolvendo a extração e estruturação da informação privilegiada e seu uso para refinar ou corrigir modelos de agrupamento. As abordagens propostas se mostraram eficazes em (i) consenso de agrupamentos, permitindo combinar diferentes representações e soluções de agrupamento; (ii) aprendizado de métricas, em que medidas de proximidades mais robustas foram obtidas com base na informação privilegiada; e (iii) seleção de modelos, em que a informação privilegiada é explorada para identificar relevantes estruturas de agrupamento hierárquico. Todas as abordagens apresentadas foram investigadas em um cenário de agrupamento incremental, permitindo seu uso em aplicações práticas caracterizadas pela necessidade de eficiência computacional e alta frequência de publicação de novo conhecimento textual.Hierarchical text clustering methods are very useful to analyze the implicit knowledge in textual collections, enabling the organization of textual documents into clusters and subclusters to facilitate the knowledge browsing at various levels of granularity. Such methods are classified as unsupervised machine learning, since the clustering models are obtained only by observing regularities of textual data without human supervision. Traditional clustering methods assume that the text collection is represented only by the technical information, i.e., words and phrases extracted directly from the texts. On the other hand, in many text clustering tasks there is an additional and valuable knowledge about the problem domain, usually extracted by an advanced process with support of the domain experts. Due to the high cost of obtaining such expert knowledge, this additional information is defined as privileged and is usually available to represent only a subset of the textual documents. Recently, a new machine learning paradigm called LUPI (Learning Using Privileged Information) was proposed by Vapnik to incorporate privileged information into supervised learning methods. In this thesis, the LUPI paradigm was extended to unsupervised learning setting, in particular for hierarchical text clustering. We propose and evaluate approaches to deal with different challenges for clustering tasks, involving the extraction and structuring of privileged information and using this additional information to refine or correct clustering models. The proposed approaches were effective in (i) consensus clustering, allowing to combine different clustering solutions and textual representations; (ii) metric learning, in which more robust proximity measures are obtained from privileged information; and (iii) model selection, in which the privileged information is exploited to identify the relevant structures of hierarchical clustering. All the approaches presented in this thesis were investigated in an incremental clustering scenario, allowing its use in practical applications that require computational efficiency as well as deal with high frequency of publication of new textual knowledge

    A process to support analysts in exploring and selecting content from online forums

    No full text
    The public content increasingly available on the Internet, especially in online forums, enables researchers to study society in new ways. However, qualitative analysis of online forums is very time consuming and most contente is not related to researchers’ interest. Consequently, analysts face the following problem: how to efficiently explore and select the content to be analyzed? This article introduces a new process to support analysts in solving this problem. This process is based on unsupervised machine learning techniques like hierarchical clustering and term co-occurrence network. A tool that helps to apply the proposed process was created to provide consolidated and structured results. This includes measurements and a contente exploration interface

    Using Opinion Mining in Context-Aware Recommender Systems: A Systematic Review

    No full text
    Recommender systems help users by recommending items, such as products and services, that can be of interest to these users. Context-aware recommender systems have been widely investigated in both academia and industry because they can make recommendations based on a user’s current context (e.g., location and time). Moreover, the advent of Web 2.0 and the growing popularity of social and e-commerce media sites have encouraged users to naturally write texts describing their assessment of items. There are increasing efforts to incorporate the rich information embedded in user’s reviews/texts into the recommender systems. Given the importance of this type of texts and their usage along with opinion mining and contextual information extraction techniques for recommender systems, we present a systematic review on the recommender systems that explore both contextual information and opinion mining. This systematic review followed a well-defined protocol. Its results were based on 17 papers, selected among 195 papers identified in four digital libraries. The results of this review give a general summary of the current research on this subject and point out some areas that may be improved in future primary works

    Exploiting Text Mining Techniques for Contextual Recommendations

    No full text
    Unlike traditional recommender systems, which make recommendations only by using the relation between users and items, a context-aware recommender system makes recommendations by incorporating available contextual information into the recommendation process. One problem of context-aware approaches is that it is required techniques to extract such additional information in an automatic manner. In this paper, we propose to use two text mining techniques which are applied to textual data to infer contextual information automatically: named entities recognition and topic hierarchies. We evaluate the proposed technique in four context-aware recommender systems. The empirical results demonstrate that by using named entities and topic hierarchies we can provide better recommendations.São Paulo Research Foundation (FAPESP) (grants 2010/20564-8, 2011/19850-9, 2012/13830-9, 2013/16039-3, 2013/22547-1)CAPESCNP
    corecore