17,154 research outputs found
Evolução da semissupervisão em detecção online de agrupamentos
The huge amount of currently available data puts considerable constraints on the task of information retrieval. Automatic methods to organize data, such as clustering, can be used to help with this task allowing timely access. Semi-supervised clustering approaches employ some additional information to guide the clustering performed based on data attributes to a more suitable data partition. However, this extra information may change over time imposing a shift in the manner by which data is organized. In order to help cope with this issue, this dissertation proposes the framework called CABESS (Cluster Adaptation Based on Evolving Semi-Supervision), for online clustering. This framework is able to deal with evolving semi-supervision obtained through user binary feedbacks. To validate the approach, the experiments were run over seven hierarchical labeled datasets considering clustering splits and merges over time. The experimental results show the potential of the proposed framework for dealing with evolving semi-supervision. Moreover, they also show that the framework is faster than traditional semi-supervised clustering algorithms using lower standard semi-supervision.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorCNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoFAPEMIG - Fundação de Amparo a Pesquisa do Estado de Minas GeraisUFU - Universidade Federal de UberlândiaDissertação (Mestrado)A disponibilidade abundante de dados torna inviável a busca manual por informações relevantes. Os métodos automáticos para organizar os dados, como a detecção de agrupamentos, podem ser úteis para ajudar nesta tarefa propiciando o acesso à informação desejada em tempo hábil. As abordagens de detecção semissupervisionada de agrupamentos empregam alguma informação adicional para guiar o processo baseado nos atributos dos dados de forma a obter uma organização mais próxima da desejada pelo usuário. Todavia, a informação extra pode mudar ao longo do tempo impondo uma mudança na maneira como os dados devem ser organizados. Para ajudar a lidar com esse problema, propõe-se o framework CABESS (Cluster Adaptation Based on Evolving Semi-Supervision), para detecção online de agrupamentos semissupervisionada. O framework é capaz de lidar com a evolução da semissupervisão obtida a partir de feedbacks binários do usuário. Para validar a abordagem, os experimentos foram executados sobre sete conjuntos de dados com rótulos baseados em hierarquia considerando a especialização e generalização dos agrupamentos ao longo do tempo. Os resultados experimentais mostram o potencial do framework proposto para lidar com a evolução da semissupervisão. Além disso, eles também mostram que o framework é mais rápido que os tradicionais algoritmos de detecção de agrupamentos semissupervisionados, mesmo usando um tipo pobre de especificação da semissupervisão
Multiclass Data Segmentation using Diffuse Interface Methods on Graphs
We present two graph-based algorithms for multiclass segmentation of
high-dimensional data. The algorithms use a diffuse interface model based on
the Ginzburg-Landau functional, related to total variation compressed sensing
and image processing. A multiclass extension is introduced using the Gibbs
simplex, with the functional's double-well potential modified to handle the
multiclass case. The first algorithm minimizes the functional using a convex
splitting numerical scheme. The second algorithm is a uses a graph adaptation
of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates
between diffusion and thresholding. We demonstrate the performance of both
algorithms experimentally on synthetic data, grayscale and color images, and
several benchmark data sets such as MNIST, COIL and WebKB. We also make use of
fast numerical solvers for finding the eigenvectors and eigenvalues of the
graph Laplacian, and take advantage of the sparsity of the matrix. Experiments
indicate that the results are competitive with or better than the current
state-of-the-art multiclass segmentation algorithms.Comment: 14 page
Identifying Users with Opposing Opinions in Twitter Debates
In recent times, social media sites such as Twitter have been extensively
used for debating politics and public policies. These debates span millions of
tweets and numerous topics of public importance. Thus, it is imperative that
this vast trove of data is tapped in order to gain insights into public opinion
especially on hotly contested issues such as abortion, gun reforms etc. Thus,
in our work, we aim to gauge users' stance on such topics in Twitter. We
propose ReLP, a semi-supervised framework using a retweet-based label
propagation algorithm coupled with a supervised classifier to identify users
with differing opinions. In particular, our framework is designed such that it
can be easily adopted to different domains with little human supervision while
still producing excellent accuracyComment: Corrected typos in Section 4, under "Visibly Opinionated Users". The
numbers did not add up. Results remain unchange
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
With the rise of social media, millions of people are routinely expressing
their moods, feelings, and daily struggles with mental health issues on social
media platforms like Twitter. Unlike traditional observational cohort studies
conducted through questionnaires and self-reported surveys, we explore the
reliable detection of clinical depression from tweets obtained unobtrusively.
Based on the analysis of tweets crawled from users with self-reported
depressive symptoms in their Twitter profiles, we demonstrate the potential for
detecting clinical depression symptoms which emulate the PHQ-9 questionnaire
clinicians use today. Our study uses a semi-supervised statistical model to
evaluate how the duration of these symptoms and their expression on Twitter (in
terms of word usage patterns and topical preferences) align with the medical
findings reported via the PHQ-9. Our proactive and automatic screening tool is
able to identify clinical depressive symptoms with an accuracy of 68% and
precision of 72%.Comment: 8 pages, Advances in Social Networks Analysis and Mining (ASONAM),
2017 IEEE/ACM International Conferenc
EGFC: Evolving Gaussian Fuzzy Classifier from Never-Ending Semi-Supervised Data Streams -- With Application to Power Quality Disturbance Detection and Classification
Power-quality disturbances lead to several drawbacks such as limitation of
the production capacity, increased line and equipment currents, and consequent
ohmic losses; higher operating temperatures, premature faults, reduction of
life expectancy of machines, malfunction of equipment, and unplanned outages.
Real-time detection and classification of disturbances are deemed essential to
industry standards. We propose an Evolving Gaussian Fuzzy Classification (EGFC)
framework for semi-supervised disturbance detection and classification combined
with a hybrid Hodrick-Prescott and Discrete-Fourier-Transform
attribute-extraction method applied over a landmark window of voltage
waveforms. Disturbances such as spikes, notching, harmonics, and oscillatory
transient are considered. Different from other monitoring systems, which
require offline training of models based on a limited amount of data and
occurrences, the proposed online data-stream-based EGFC method is able to learn
disturbance patterns autonomously from never-ending data streams by adapting
the parameters and structure of a fuzzy rule base on the fly. Moreover, the
fuzzy model obtained is linguistically interpretable, which improves model
acceptability. We show encouraging classification results.Comment: 10 pages, 6 figures, 1 table, IEEE International Conference on Fuzzy
Systems (FUZZ-IEEE 2020
- …