246,459 research outputs found

    A Survey on Social Media Anomaly Detection

    Full text link
    Social media anomaly detection is of critical importance to prevent malicious activities such as bullying, terrorist attack planning, and fraud information dissemination. With the recent popularity of social media, new types of anomalous behaviors arise, causing concerns from various parties. While a large amount of work have been dedicated to traditional anomaly detection problems, we observe a surge of research interests in the new realm of social media anomaly detection. In this paper, we present a survey on existing approaches to address this problem. We focus on the new type of anomalous phenomena in the social media and review the recent developed techniques to detect those special types of anomalies. We provide a general overview of the problem domain, common formulations, existing methodologies and potential directions. With this work, we hope to call out the attention from the research community on this challenging problem and open up new directions that we can contribute in the future.Comment: 23 page

    Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

    Full text link
    Early detection and precise characterization of emerging topics in text streams can be highly useful in applications such as timely and targeted public health interventions and discovering evolving regional business trends. Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have numerous shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. In this paper, we describe Semantic Scan (SS) that has been developed specifically to overcome these shortcomings in detecting new spatially compact events in text streams. Semantic Scan integrates novel contrastive topic modeling with online document assignment and principled likelihood ratio-based spatial scanning to identify emerging events with unexpected patterns of keywords hidden in text streams. This enables more timely and accurate detection and characterization of anomalous, spatially localized emerging events. Semantic Scan does not require manual intervention or labeled training data, and is robust to noise in real-world text data since it identifies anomalous text patterns that occur in a cluster of new documents rather than an anomaly in a single new document. We compare Semantic Scan to alternative state-of-the-art methods such as Topics over Time, Online LDA, and Labeled LDA on two real-world tasks: (i) a disease surveillance task monitoring free-text Emergency Department chief complaints in Allegheny County, and (ii) an emerging business trend detection task based on Yelp reviews. On both tasks, we find that Semantic Scan provides significantly better event detection and characterization accuracy than competing approaches, while providing up to an order of magnitude speedup.Comment: 10 pages, 4 figures, KDD 2016 submissio

    Automatic Detection of Trends in Dynamical Text: An Evolutionary Approach

    Full text link
    This paper presents an evolutionary algorithm for modeling the arrival dates of document streams, which is any time-stamped collection of documents, such as newscasts, e-mails, IRC conversations, scientific journals archives and weblog postings. This algorithm assigns frequencies (number of document arrivals per time unit) to time intervals so that it produces an optimal fit to the data. The optimization is a trade off between accurately fitting the data and avoiding too many frequency changes; this way the analysis is able to find fits which ignore the noise. Classical dynamic programming algorithms are limited by memory and efficiency requirements, which can be a problem when dealing with long streams. This suggests to explore alternative search methods which allow for some degree of uncertainty to achieve tractability. Experiments have shown that the designed evolutionary algorithm is able to reach the same solution quality as those classical dynamic programming algorithms in a shorter time. We have also explored different probabilistic models to optimize the fitting of the date streams, and applied these algorithms to infer whether a new arrival increases or decreases {\em interest} in the topic the document stream is about.Comment: 22 pages, submitted to Journal of Information Retrieva

    Location-Based Events Detection on Micro-Blogs

    Full text link
    The increasing use of social networks generates enormous amounts of data that can be used for many types of analysis. Some of these data have temporal and geographical information, which can be used for comprehensive examination. In this paper, we propose a new method to analyze the massive volume of messages available in Twitter to identify places in the world where topics such as TV shows, climate change, disasters, and sports are emerging. The proposed method is based on a neural network that is used to detect outliers from a time series, which is built upon statistical data from tweets located on different political divisions (i.e., countries, cities). The outliers are used to identify topics within an abnormal behavior in Twitter. The effectiveness of our method is evaluated in an online environment indicating new findings on modeling local people's behavior from different places.Comment: 10 pages, 5 figures, submitted and rejected for SBBD 201

    A Study of "Churn" in Tweets and Real-Time Search Queries (Extended Version)

    Full text link
    The real-time nature of Twitter means that term distributions in tweets and in search queries change rapidly: the most frequent terms in one hour may look very different from those in the next. Informally, we call this phenomenon "churn". Our interest in analyzing churn stems from the perspective of real-time search. Nearly all ranking functions, machine-learned or otherwise, depend on term statistics such as term frequency, document frequency, as well as query frequencies. In the real-time context, how do we compute these statistics, considering that the underlying distributions change rapidly? In this paper, we present an analysis of tweet and query churn on Twitter, as a first step to answering this question. Analyses reveal interesting insights on the temporal dynamics of term distributions on Twitter and hold implications for the design of search systems.Comment: This is an extended version of a similarly-titled paper at the 6th International AAAI Conference on Weblogs and Social Media (ICWSM 2012

    Meetings and meeting modeling in smart surroundings

    Get PDF
    In this paper we survey our research on smart meeting rooms and its relevance for augmented\ud reality meeting support and virtual reality generation of meetings in real-time or off-line. Intelligent\ud real-time and off-line generation requires understanding of what is going on during\ud a meeting. The research reported here takes place in the European 5th and 6th framework\ud programme projects M4 (Multi-Modal Meeting Manager) and AMI (Augmented Multi-party\ud Interaction). Both projects aim at building a smart meeting environment that is able to capture\ud in a multimodal way the activities and discussions in a meeting room, with the aim to use\ud this information as input to tools that allow real-time support, browsing, retrieval and summarization\ud of meetings. In these projects many European research groups participate. Our\ud aim is to research (semantic) representations of what takes place during meetings in order to\ud allow generation, e.g. in virtual reality, of meeting activities (discussions, presentations, voting,\ud etcetera). Being able to do so also allows us to look at tools that provide support during\ud a meeting and at tools that allow those not able to be physically present during a meeting\ud to take part in a virtual way. This may lead to situations where the differences between\ud real meeting participants, human-controlled virtual participants and (semi-) autonomous virtual\ud participants disappear. In this paper we introduce our research aims and ideas and we\ud illustrate them with examples taken from many different projects in related areas

    Cyber-Physical Systems Security: a Systematic Mapping Study

    Full text link
    Cyber-physical systems are integrations of computation, networking, and physical processes. Due to the tight cyber-physical coupling and to the potentially disrupting consequences of failures, security here is one of the primary concerns. Our systematic mapping study sheds some light on how security is actually addressed when dealing with cyber-physical systems. The provided systematic map of 118 selected studies is based on, for instance, application fields, various system components, related algorithms and models, attacks characteristics and defense strategies. It presents a powerful comparison framework for existing and future research on this hot topic, important for both industry and academia.Comment: arXiv admin note: text overlap with arXiv:1205.5073 by other author

    Composite Behavioral Modeling for Identity Theft Detection in Online Social Networks

    Full text link
    In this work, we aim at building a bridge from poor behavioral data to an effective, quick-response, and robust behavior model for online identity theft detection. We concentrate on this issue in online social networks (OSNs) where users usually have composite behavioral records, consisting of multi-dimensional low-quality data, e.g., offline check-ins and online user generated content (UGC). As an insightful result, we find that there is a complementary effect among different dimensions of records for modeling users' behavioral patterns. To deeply exploit such a complementary effect, we propose a joint model to capture both online and offline features of a user's composite behavior. We evaluate the proposed joint model by comparing with some typical models on two real-world datasets: Foursquare and Yelp. In the widely-used setting of theft simulation (simulating thefts via behavioral replacement), the experimental results show that our model outperforms the existing ones, with the AUC values 0.9560.956 in Foursquare and 0.9470.947 in Yelp, respectively. Particularly, the recall (True Positive Rate) can reach up to 65.3%65.3\% in Foursquare and 72.2%72.2\% in Yelp with the corresponding disturbance rate (False Positive Rate) below 1%1\%. It is worth mentioning that these performances can be achieved by examining only one composite behavior (visiting a place and posting a tip online simultaneously) per authentication, which guarantees the low response latency of our method. This study would give the cybersecurity community new insights into whether and how a real-time online identity authentication can be improved via modeling users' composite behavioral patterns

    ATD: Anomalous Topic Discovery in High Dimensional Discrete Data

    Full text link
    We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups (clusters) of anomalies; i.e. sets of points which collectively exhibit abnormal patterns. In many applications this can lead to better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In this paper, we focus on detecting anomalous topics in a batch of text documents, developing our algorithm based on topic models. Results of our experiments show that our method can accurately detect anomalous topics and salient features (words) under each such topic in a synthetic data set and two real-world text corpora and achieves better performance compared to both standard group AD and individual AD techniques. All required code to reproduce our experiments is available from https://github.com/hsoleimani/AT

    Learning Execution Contexts from System Call Distributions for Intrusion Detection in Embedded Systems

    Full text link
    Existing techniques used for intrusion detection do not fully utilize the intrinsic properties of embedded systems. In this paper, we propose a lightweight method for detecting anomalous executions using a distribution of system call frequencies. We use a cluster analysis to learn the legitimate execution contexts of embedded applications and then monitor them at run-time to capture abnormal executions. We also present an architectural framework with minor processor modifications to aid in this process. Our prototype shows that the proposed method can effectively detect anomalous executions without relying on sophisticated analyses or affecting the critical execution paths
    • …
    corecore