5 research outputs found
Evolução da semissupervisão em detecção online de agrupamentos
The huge amount of currently available data puts considerable constraints on the task of information retrieval. Automatic methods to organize data, such as clustering, can be used to help with this task allowing timely access. Semi-supervised clustering approaches employ some additional information to guide the clustering performed based on data attributes to a more suitable data partition. However, this extra information may change over time imposing a shift in the manner by which data is organized. In order to help cope with this issue, this dissertation proposes the framework called CABESS (Cluster Adaptation Based on Evolving Semi-Supervision), for online clustering. This framework is able to deal with evolving semi-supervision obtained through user binary feedbacks. To validate the approach, the experiments were run over seven hierarchical labeled datasets considering clustering splits and merges over time. The experimental results show the potential of the proposed framework for dealing with evolving semi-supervision. Moreover, they also show that the framework is faster than traditional semi-supervised clustering algorithms using lower standard semi-supervision.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorCNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoFAPEMIG - Fundação de Amparo a Pesquisa do Estado de Minas GeraisUFU - Universidade Federal de UberlândiaDissertação (Mestrado)A disponibilidade abundante de dados torna inviável a busca manual por informações relevantes. Os métodos automáticos para organizar os dados, como a detecção de agrupamentos, podem ser úteis para ajudar nesta tarefa propiciando o acesso à informação desejada em tempo hábil. As abordagens de detecção semissupervisionada de agrupamentos empregam alguma informação adicional para guiar o processo baseado nos atributos dos dados de forma a obter uma organização mais próxima da desejada pelo usuário. Todavia, a informação extra pode mudar ao longo do tempo impondo uma mudança na maneira como os dados devem ser organizados. Para ajudar a lidar com esse problema, propõe-se o framework CABESS (Cluster Adaptation Based on Evolving Semi-Supervision), para detecção online de agrupamentos semissupervisionada. O framework é capaz de lidar com a evolução da semissupervisão obtida a partir de feedbacks binários do usuário. Para validar a abordagem, os experimentos foram executados sobre sete conjuntos de dados com rótulos baseados em hierarquia considerando a especialização e generalização dos agrupamentos ao longo do tempo. Os resultados experimentais mostram o potencial do framework proposto para lidar com a evolução da semissupervisão. Além disso, eles também mostram que o framework é mais rápido que os tradicionais algoritmos de detecção de agrupamentos semissupervisionados, mesmo usando um tipo pobre de especificação da semissupervisão
Recommended from our members
Learning with Partial Supervision for Clustering and Classification
In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not have the full label but only partial supervision about the data, such as instance similarities or incomplete label assignments. In such cases, traditional clustering and classification methods do not directly apply. To address such problems, this thesis focuses on the task of learning from partial supervision for clustering and classification tasks. For clustering with partial supervision, we investigate three problems: a) constrained clustering in multi-instance multi-label learning, where the goal is to group instances into clusters that respect the background knowledge given by the bag-level labels; b) clustering with constraints, where the partial supervision is expressed as "pairwise constraints" or "relative constraints", regarding similarities about instance pairs and triplets respectively; c) active learning of pairwise constraints for clustering, where the goal is to improve the clustering with minimum human effort by iteratively querying the most informative pairs to an oracle. For classification with partial supervision, we address the problem of multi-label learning where data is associated with a latent label hierarchy and incomplete label assignments, and the goal is to simultaneously discover the latent hierarchy as well as to learn a multi-label classifier that is consistent with the hierarchy.Keywords: Classification, Partial Supervision, Active Learning, Clusterin