17,154 research outputs found

    Evolução da semissupervisão em detecção online de agrupamentos

    Get PDF
    The huge amount of currently available data puts considerable constraints on the task of information retrieval. Automatic methods to organize data, such as clustering, can be used to help with this task allowing timely access. Semi-supervised clustering approaches employ some additional information to guide the clustering performed based on data attributes to a more suitable data partition. However, this extra information may change over time imposing a shift in the manner by which data is organized. In order to help cope with this issue, this dissertation proposes the framework called CABESS (Cluster Adaptation Based on Evolving Semi-Supervision), for online clustering. This framework is able to deal with evolving semi-supervision obtained through user binary feedbacks. To validate the approach, the experiments were run over seven hierarchical labeled datasets considering clustering splits and merges over time. The experimental results show the potential of the proposed framework for dealing with evolving semi-supervision. Moreover, they also show that the framework is faster than traditional semi-supervised clustering algorithms using lower standard semi-supervision.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorCNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoFAPEMIG - Fundação de Amparo a Pesquisa do Estado de Minas GeraisUFU - Universidade Federal de UberlândiaDissertação (Mestrado)A disponibilidade abundante de dados torna inviável a busca manual por informações relevantes. Os métodos automáticos para organizar os dados, como a detecção de agrupamentos, podem ser úteis para ajudar nesta tarefa propiciando o acesso à informação desejada em tempo hábil. As abordagens de detecção semissupervisionada de agrupamentos empregam alguma informação adicional para guiar o processo baseado nos atributos dos dados de forma a obter uma organização mais próxima da desejada pelo usuário. Todavia, a informação extra pode mudar ao longo do tempo impondo uma mudança na maneira como os dados devem ser organizados. Para ajudar a lidar com esse problema, propõe-se o framework CABESS (Cluster Adaptation Based on Evolving Semi-Supervision), para detecção online de agrupamentos semissupervisionada. O framework é capaz de lidar com a evolução da semissupervisão obtida a partir de feedbacks binários do usuário. Para validar a abordagem, os experimentos foram executados sobre sete conjuntos de dados com rótulos baseados em hierarquia considerando a especialização e generalização dos agrupamentos ao longo do tempo. Os resultados experimentais mostram o potencial do framework proposto para lidar com a evolução da semissupervisão. Além disso, eles também mostram que o framework é mais rápido que os tradicionais algoritmos de detecção de agrupamentos semissupervisionados, mesmo usando um tipo pobre de especificação da semissupervisão

    Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

    Full text link
    We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms.Comment: 14 page

    Identifying Users with Opposing Opinions in Twitter Debates

    Full text link
    In recent times, social media sites such as Twitter have been extensively used for debating politics and public policies. These debates span millions of tweets and numerous topics of public importance. Thus, it is imperative that this vast trove of data is tapped in order to gain insights into public opinion especially on hotly contested issues such as abortion, gun reforms etc. Thus, in our work, we aim to gauge users' stance on such topics in Twitter. We propose ReLP, a semi-supervised framework using a retweet-based label propagation algorithm coupled with a supervised classifier to identify users with differing opinions. In particular, our framework is designed such that it can be easily adopted to different domains with little human supervision while still producing excellent accuracyComment: Corrected typos in Section 4, under "Visibly Opinionated Users". The numbers did not add up. Results remain unchange

    Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media

    Get PDF
    With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the analysis of tweets crawled from users with self-reported depressive symptoms in their Twitter profiles, we demonstrate the potential for detecting clinical depression symptoms which emulate the PHQ-9 questionnaire clinicians use today. Our study uses a semi-supervised statistical model to evaluate how the duration of these symptoms and their expression on Twitter (in terms of word usage patterns and topical preferences) align with the medical findings reported via the PHQ-9. Our proactive and automatic screening tool is able to identify clinical depressive symptoms with an accuracy of 68% and precision of 72%.Comment: 8 pages, Advances in Social Networks Analysis and Mining (ASONAM), 2017 IEEE/ACM International Conferenc

    EGFC: Evolving Gaussian Fuzzy Classifier from Never-Ending Semi-Supervised Data Streams -- With Application to Power Quality Disturbance Detection and Classification

    Full text link
    Power-quality disturbances lead to several drawbacks such as limitation of the production capacity, increased line and equipment currents, and consequent ohmic losses; higher operating temperatures, premature faults, reduction of life expectancy of machines, malfunction of equipment, and unplanned outages. Real-time detection and classification of disturbances are deemed essential to industry standards. We propose an Evolving Gaussian Fuzzy Classification (EGFC) framework for semi-supervised disturbance detection and classification combined with a hybrid Hodrick-Prescott and Discrete-Fourier-Transform attribute-extraction method applied over a landmark window of voltage waveforms. Disturbances such as spikes, notching, harmonics, and oscillatory transient are considered. Different from other monitoring systems, which require offline training of models based on a limited amount of data and occurrences, the proposed online data-stream-based EGFC method is able to learn disturbance patterns autonomously from never-ending data streams by adapting the parameters and structure of a fuzzy rule base on the fly. Moreover, the fuzzy model obtained is linguistically interpretable, which improves model acceptability. We show encouraging classification results.Comment: 10 pages, 6 figures, 1 table, IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2020
    corecore