7,068 research outputs found
Abusive Language Detection in Online Conversations by Combining Content-and Graph-based Features
In recent years, online social networks have allowed worldwide users to meet
and discuss. As guarantors of these communities, the administrators of these
platforms must prevent users from adopting inappropriate behaviors. This
verification task, mainly done by humans, is more and more difficult due to the
ever growing amount of messages to check. Methods have been proposed to
automatize this moderation process, mainly by providing approaches based on the
textual content of the exchanged messages. Recent work has also shown that
characteristics derived from the structure of conversations, in the form of
conversational graphs, can help detecting these abusive messages. In this
paper, we propose to take advantage of both sources of information by proposing
fusion methods integrating content-and graph-based features. Our experiments on
raw chat logs show that the content of the messages, but also of their dynamics
within a conversation contain partially complementary information, allowing
performance improvements on an abusive message classification task with a final
F-measure of 93.26%
Discriminating word senses with tourist walks in complex networks
Patterns of topological arrangement are widely used for both animal and human
brains in the learning process. Nevertheless, automatic learning techniques
frequently overlook these patterns. In this paper, we apply a learning
technique based on the structural organization of the data in the attribute
space to the problem of discriminating the senses of 10 polysemous words. Using
two types of characterization of meanings, namely semantical and topological
approaches, we have observed significative accuracy rates in identifying the
suitable meanings in both techniques. Most importantly, we have found that the
characterization based on the deterministic tourist walk improves the
disambiguation process when one compares with the discrimination achieved with
traditional complex networks measurements such as assortativity and clustering
coefficient. To our knowledge, this is the first time that such deterministic
walk has been applied to such a kind of problem. Therefore, our finding
suggests that the tourist walk characterization may be useful in other related
applications
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms
for data stream classification, with applications ranging from predicting ozone
level peaks, learning stock market indicators, to detecting computer security
violations. In addition, a number of methods have been developed to detect
concept drifts in these streams. Consider a scenario where we have a number of
classifiers with diverse learning styles and different drift detectors.
Intuitively, the current 'best' (classifier, detector) pair is application
dependent and may change as a result of the stream evolution. Our research
builds on this observation. We introduce the \mbox{Tornado} framework that
implements a reservoir of diverse classifiers, together with a variety of drift
detection algorithms. In our framework, all (classifier, detector) pairs
proceed, in parallel, to construct models against the evolving data streams. At
any point in time, we select the pair which currently yields the best
performance. We further incorporate two novel stacking-based drift detection
methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The
experimental evaluation confirms that the current 'best' (classifier, detector)
pair is not only heavily dependent on the characteristics of the stream, but
also that this selection evolves as the stream flows. Further, our
\mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion
while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure
Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media
Most of the online news media outlets rely heavily on the revenues generated
from the clicks made by their readers, and due to the presence of numerous such
outlets, they need to compete with each other for reader attention. To attract
the readers to click on an article and subsequently visit the media site, the
outlets often come up with catchy headlines accompanying the article links,
which lure the readers to click on the link. Such headlines are known as
Clickbaits. While these baits may trick the readers into clicking, in the long
run, clickbaits usually don't live up to the expectation of the readers, and
leave them disappointed.
In this work, we attempt to automatically detect clickbaits and then build a
browser extension which warns the readers of different media sites about the
possibility of being baited by such headlines. The extension also offers each
reader an option to block clickbaits she doesn't want to see. Then, using such
reader choices, the extension automatically blocks similar clickbaits during
her future visits. We run extensive offline and online experiments across
multiple media sites and find that the proposed clickbait detection and the
personalized blocking approaches perform very well achieving 93% accuracy in
detecting and 89% accuracy in blocking clickbaits.Comment: 2016 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining (ASONAM
Naive bayes multi-label classification approach for high-voltage condition monitoring
This paper addresses for the first time the multilabel classification of High-Voltage (HV) discharges captured using the Electromagnetic Interference (EMI) method for HV machines. The approach involves feature extraction from EMI time signals, emitted during the discharge events, by means of 1D-Local Binary Pattern (LBP) and 1D-Histogram of Oriented Gradients (HOG) techniques. Their combination provides a feature vector that is implemented in a naive Bayes classifier designed to identify the labels of two or more discharge sources contained within a single signal. The performance of this novel approach is measured using various metrics including average precision, accuracy, specificity, hamming loss etc. Results demonstrate a successful performance that is in line with similar application to other fields such as biology and image processing. This first attempt of multi-label classification of EMI discharge sources opens a new research topic in HV condition monitoring
- …