246,459 research outputs found
A Survey on Social Media Anomaly Detection
Social media anomaly detection is of critical importance to prevent malicious
activities such as bullying, terrorist attack planning, and fraud information
dissemination. With the recent popularity of social media, new types of
anomalous behaviors arise, causing concerns from various parties. While a large
amount of work have been dedicated to traditional anomaly detection problems,
we observe a surge of research interests in the new realm of social media
anomaly detection. In this paper, we present a survey on existing approaches to
address this problem. We focus on the new type of anomalous phenomena in the
social media and review the recent developed techniques to detect those special
types of anomalies. We provide a general overview of the problem domain, common
formulations, existing methodologies and potential directions. With this work,
we hope to call out the attention from the research community on this
challenging problem and open up new directions that we can contribute in the
future.Comment: 23 page
Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams
Early detection and precise characterization of emerging topics in text
streams can be highly useful in applications such as timely and targeted public
health interventions and discovering evolving regional business trends. Many
methods have been proposed for detecting emerging events in text streams using
topic modeling. However, these methods have numerous shortcomings that make
them unsuitable for rapid detection of locally emerging events on massive text
streams. In this paper, we describe Semantic Scan (SS) that has been developed
specifically to overcome these shortcomings in detecting new spatially compact
events in text streams.
Semantic Scan integrates novel contrastive topic modeling with online
document assignment and principled likelihood ratio-based spatial scanning to
identify emerging events with unexpected patterns of keywords hidden in text
streams. This enables more timely and accurate detection and characterization
of anomalous, spatially localized emerging events. Semantic Scan does not
require manual intervention or labeled training data, and is robust to noise in
real-world text data since it identifies anomalous text patterns that occur in
a cluster of new documents rather than an anomaly in a single new document.
We compare Semantic Scan to alternative state-of-the-art methods such as
Topics over Time, Online LDA, and Labeled LDA on two real-world tasks: (i) a
disease surveillance task monitoring free-text Emergency Department chief
complaints in Allegheny County, and (ii) an emerging business trend detection
task based on Yelp reviews. On both tasks, we find that Semantic Scan provides
significantly better event detection and characterization accuracy than
competing approaches, while providing up to an order of magnitude speedup.Comment: 10 pages, 4 figures, KDD 2016 submissio
Automatic Detection of Trends in Dynamical Text: An Evolutionary Approach
This paper presents an evolutionary algorithm for modeling the arrival dates
of document streams, which is any time-stamped collection of documents, such as
newscasts, e-mails, IRC conversations, scientific journals archives and weblog
postings. This algorithm assigns frequencies (number of document arrivals per
time unit) to time intervals so that it produces an optimal fit to the data.
The optimization is a trade off between accurately fitting the data and
avoiding too many frequency changes; this way the analysis is able to find fits
which ignore the noise. Classical dynamic programming algorithms are limited by
memory and efficiency requirements, which can be a problem when dealing with
long streams. This suggests to explore alternative search methods which allow
for some degree of uncertainty to achieve tractability. Experiments have shown
that the designed evolutionary algorithm is able to reach the same solution
quality as those classical dynamic programming algorithms in a shorter time. We
have also explored different probabilistic models to optimize the fitting of
the date streams, and applied these algorithms to infer whether a new arrival
increases or decreases {\em interest} in the topic the document stream is
about.Comment: 22 pages, submitted to Journal of Information Retrieva
Location-Based Events Detection on Micro-Blogs
The increasing use of social networks generates enormous amounts of data that
can be used for many types of analysis. Some of these data have temporal and
geographical information, which can be used for comprehensive examination. In
this paper, we propose a new method to analyze the massive volume of messages
available in Twitter to identify places in the world where topics such as TV
shows, climate change, disasters, and sports are emerging. The proposed method
is based on a neural network that is used to detect outliers from a time
series, which is built upon statistical data from tweets located on different
political divisions (i.e., countries, cities). The outliers are used to
identify topics within an abnormal behavior in Twitter. The effectiveness of
our method is evaluated in an online environment indicating new findings on
modeling local people's behavior from different places.Comment: 10 pages, 5 figures, submitted and rejected for SBBD 201
A Study of "Churn" in Tweets and Real-Time Search Queries (Extended Version)
The real-time nature of Twitter means that term distributions in tweets and
in search queries change rapidly: the most frequent terms in one hour may look
very different from those in the next. Informally, we call this phenomenon
"churn". Our interest in analyzing churn stems from the perspective of
real-time search. Nearly all ranking functions, machine-learned or otherwise,
depend on term statistics such as term frequency, document frequency, as well
as query frequencies. In the real-time context, how do we compute these
statistics, considering that the underlying distributions change rapidly? In
this paper, we present an analysis of tweet and query churn on Twitter, as a
first step to answering this question. Analyses reveal interesting insights on
the temporal dynamics of term distributions on Twitter and hold implications
for the design of search systems.Comment: This is an extended version of a similarly-titled paper at the 6th
International AAAI Conference on Weblogs and Social Media (ICWSM 2012
Meetings and meeting modeling in smart surroundings
In this paper we survey our research on smart meeting rooms and its relevance for augmented\ud
reality meeting support and virtual reality generation of meetings in real-time or off-line. Intelligent\ud
real-time and off-line generation requires understanding of what is going on during\ud
a meeting. The research reported here takes place in the European 5th and 6th framework\ud
programme projects M4 (Multi-Modal Meeting Manager) and AMI (Augmented Multi-party\ud
Interaction). Both projects aim at building a smart meeting environment that is able to capture\ud
in a multimodal way the activities and discussions in a meeting room, with the aim to use\ud
this information as input to tools that allow real-time support, browsing, retrieval and summarization\ud
of meetings. In these projects many European research groups participate. Our\ud
aim is to research (semantic) representations of what takes place during meetings in order to\ud
allow generation, e.g. in virtual reality, of meeting activities (discussions, presentations, voting,\ud
etcetera). Being able to do so also allows us to look at tools that provide support during\ud
a meeting and at tools that allow those not able to be physically present during a meeting\ud
to take part in a virtual way. This may lead to situations where the differences between\ud
real meeting participants, human-controlled virtual participants and (semi-) autonomous virtual\ud
participants disappear. In this paper we introduce our research aims and ideas and we\ud
illustrate them with examples taken from many different projects in related areas
Cyber-Physical Systems Security: a Systematic Mapping Study
Cyber-physical systems are integrations of computation, networking, and
physical processes. Due to the tight cyber-physical coupling and to the
potentially disrupting consequences of failures, security here is one of the
primary concerns. Our systematic mapping study sheds some light on how security
is actually addressed when dealing with cyber-physical systems. The provided
systematic map of 118 selected studies is based on, for instance, application
fields, various system components, related algorithms and models, attacks
characteristics and defense strategies. It presents a powerful comparison
framework for existing and future research on this hot topic, important for
both industry and academia.Comment: arXiv admin note: text overlap with arXiv:1205.5073 by other author
Composite Behavioral Modeling for Identity Theft Detection in Online Social Networks
In this work, we aim at building a bridge from poor behavioral data to an
effective, quick-response, and robust behavior model for online identity theft
detection. We concentrate on this issue in online social networks (OSNs) where
users usually have composite behavioral records, consisting of
multi-dimensional low-quality data, e.g., offline check-ins and online user
generated content (UGC). As an insightful result, we find that there is a
complementary effect among different dimensions of records for modeling users'
behavioral patterns. To deeply exploit such a complementary effect, we propose
a joint model to capture both online and offline features of a user's composite
behavior. We evaluate the proposed joint model by comparing with some typical
models on two real-world datasets: Foursquare and Yelp. In the widely-used
setting of theft simulation (simulating thefts via behavioral replacement), the
experimental results show that our model outperforms the existing ones, with
the AUC values in Foursquare and in Yelp, respectively.
Particularly, the recall (True Positive Rate) can reach up to in
Foursquare and in Yelp with the corresponding disturbance rate (False
Positive Rate) below . It is worth mentioning that these performances can
be achieved by examining only one composite behavior (visiting a place and
posting a tip online simultaneously) per authentication, which guarantees the
low response latency of our method. This study would give the cybersecurity
community new insights into whether and how a real-time online identity
authentication can be improved via modeling users' composite behavioral
patterns
ATD: Anomalous Topic Discovery in High Dimensional Discrete Data
We propose an algorithm for detecting patterns exhibited by anomalous
clusters in high dimensional discrete data. Unlike most anomaly detection (AD)
methods, which detect individual anomalies, our proposed method detects groups
(clusters) of anomalies; i.e. sets of points which collectively exhibit
abnormal patterns. In many applications this can lead to better understanding
of the nature of the atypical behavior and to identifying the sources of the
anomalies. Moreover, we consider the case where the atypical patterns exhibit
on only a small (salient) subset of the very high dimensional feature space.
Individual AD techniques and techniques that detect anomalies using all the
features typically fail to detect such anomalies, but our method can detect
such instances collectively, discover the shared anomalous patterns exhibited
by them, and identify the subsets of salient features. In this paper, we focus
on detecting anomalous topics in a batch of text documents, developing our
algorithm based on topic models. Results of our experiments show that our
method can accurately detect anomalous topics and salient features (words)
under each such topic in a synthetic data set and two real-world text corpora
and achieves better performance compared to both standard group AD and
individual AD techniques. All required code to reproduce our experiments is
available from https://github.com/hsoleimani/AT
Learning Execution Contexts from System Call Distributions for Intrusion Detection in Embedded Systems
Existing techniques used for intrusion detection do not fully utilize the
intrinsic properties of embedded systems. In this paper, we propose a
lightweight method for detecting anomalous executions using a distribution of
system call frequencies. We use a cluster analysis to learn the legitimate
execution contexts of embedded applications and then monitor them at run-time
to capture abnormal executions. We also present an architectural framework with
minor processor modifications to aid in this process. Our prototype shows that
the proposed method can effectively detect anomalous executions without relying
on sophisticated analyses or affecting the critical execution paths
- …