1,383 research outputs found

    Anomaly detection in the dynamics of web and social networks

    Get PDF
    In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with a good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets.Comment: The Web Conference 2019, 10 pages, 7 figure

    Energy Transformer

    Full text link
    Transformers have become the de facto models of choice in machine learning, typically leading to impressive performance on many applications. At the same time, the architectural development in the transformer world is mostly driven by empirical findings, and the theoretical understanding of their architectural building blocks is rather limited. In contrast, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, but have not yet demonstrated truly impressive practical results. We propose a transformer architecture that replaces the sequence of feedforward transformer blocks with a single large Associative Memory model. Our novel architecture, called Energy Transformer (or ET for short), has many of the familiar architectural primitives that are often used in the current generation of transformers. However, it is not identical to the existing architectures. The sequence of transformer layers in ET is purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. As a consequence of this computational principle, the attention in ET is different from the conventional attention mechanism. In this work, we introduce the theoretical foundations of ET, explore it's empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection task

    A Graph-structured Dataset for Wikipedia Research

    Get PDF
    Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

    Anomaly detection in the dynamics of web and social networks

    Get PDF
    In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets

    INTRUSION DETECTION OF A SIMULATED SCADA SYSTEM USING A DATA-DRIVEN MODELING APPROACH

    Get PDF
    Supervisory Control and Data Acquisition (SCADA) are large, geographically distributed systems that regulate help processes in industries such as nuclear power, transportation or manufacturing. SCADA is a combination of physical, sensing, and communications equipment that is used for monitoring, control and telemetry acquisition actions. Because SCADA often control the distribution of vital resources such as electricity and water, there is a need to protect these cyber-physical systems from those with possible malicious intent. To this end, an Intrusion Detection System (IDS) is utilized to monitor telemetry sources in order to detect unwanted activities and maintain overall system integrity. This dissertation presents the results in developing a behavior-based approach to intrusion detection using a simulated SCADA test bed. Empirical modeling techniques known as Auto Associative Kernel Regression (AAKR) and Auto Associative Multivariate State Estimation Technique (AAMSET) are used to learn the normal behavior of the test bed. The test bed was then subjected to repeated intrusion injection experiments using penetration testing software and exploit codes. Residuals generated from these experiments are then supplied to an anomaly detection algorithm known as the Sequential Probability Ratio Test (SPRT). This approach is considered novel in that the AAKR and AAMSET, combined with the SPRT, have not been utilized previously in industry for cybersecurity purposes. Also presented in this dissertation is a newly developed variable grouping algorithm that is based on the Auto Correlation Function (ACF) for a given set of input data. Variable grouping is needed for these modeling methods to arrive at a suitable set of predictors that return the lowest error in model performance. The developed behavior-based techniques were able to successfully detect many types of intrusions that include network reconnaissance, DoS, unauthorized access, and information theft. These methods would then be useful in detecting unwanted activities of intruders from both inside and outside of the monitored network. These developed methods would also serve to add an additional layer of security. When compared with two separate variable grouping methods, the newly developed grouping method presented in this dissertation was shown to extract similar groups or groups with lower average model prediction errors

    A Critical Analysis Of The State-Of-The-Art On Automated Detection Of Deceptive Behavior In Social Media

    Get PDF
    Recently, a large body of research has been devoted to examine the user behavioral patterns and the business implications of social media. However, relatively little research has been conducted regarding users’ deceptive activities in social media; these deceptive activities may hinder the effective application of the data collected from social media to perform e-marketing and initiate business transformation in general. One of the main contributions of this paper is the critical analysis of the possible forms of deceptive behavior in social media and the state-of-the-art technologies for automated deception detection in social media. Based on the proposed taxonomy of major deception types, the assumptions, advantages, and disadvantages of the popular deception detection methods are analyzed. Our critical analysis shows that deceptive behavior may evolve over time, and so making it difficult for the existing methods to effectively detect social media spam. Accordingly, another main contribution of this paper is the design and development of a generic framework to combat dynamic deceptive activities in social media. The managerial implication of our research is that business managers or marketers will develop better insights about the possible deceptive behavior in social media before they tap into social media to collect and generate market intelligence. Moreover, they can apply the proposed adaptive deception detection framework to more effectively combat the ever increasing and evolving deceptive activities in social medi

    Big data analytics for preventive medicine

    Get PDF
    © 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations
    • …
    corecore