Search CORE

12 research outputs found

Communities and Anomaly Detection in Large Edged-Labeled Graphs

Author: Miguel Ramos de Araújo
Publication venue
Publication date: 10/05/2017
Field of study

Repositório Aberto da Universidade do Porto

One size does not fit all : profiling personalized time-evolving user behaviors

Author: Devineni Pravallika
Doğruöz A. Seza
Faloutsos Michalis
Koutra Danai
Papalexakis Evangelos E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Given the set of social interactions of a user, how can we detect changes in interaction patterns over time? While most previous work has focused on studying network-wide properties and spotting outlier users, the dynamics of individual user interactions remain largely unexplored. This work sets out to explore those dynamics in a way that is minimally invasive to privacy, thus, avoids to rely on the textual content of user posts---except for validation. Our contributions are two-fold. First, in contrast to previous studies, we challenge the use of a fixed interval of observation. We introduce and empirically validate the "Temporal Asymmetry Hypothesis", which states that appropriate observation intervals should vary both among users and over time for the same user. We validate this hypothesis using eight different datasets, including email, messaging, and social networks data. Second, we propose iNET, a comprehensive analytic and visualization framework which provides personalized insights into user behavior and operates in a streaming fashion. iNET learns personalized baseline behaviors of users and uses them to identify events that signify changes in user behavior. We evaluate the effectiveness of iNET by analyzing more than half a million interactions from Facebook users. Labeling of the identified changes in user behavior showed that iNET is able to capture a wide spectrum of exogenous and endogenous events, while the baselines are less diverse in nature and capture only 66% of that spectrum. Furthermore, iNET exhibited the highest precision (95%) compared to all competing approaches

Ghent University Academic Bibliography

Graph based Anomaly Detection and Description: A Survey

Author: Danai Koutra
Hanghang Tong
Leman Akoglu
Publication venue
Publication date: 28/04/2014
Field of study

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

arXiv.org e-Print Archive

CiteSeerX

Summarizing Dynamic Graphs using MDL

Author: Saran Divyam
Vreeken Jilles
Publication venue
Publication date: 01/01/2019
Field of study

How can we succinctly describe a large, dynamic graph over time? Given a large dynamic graph, can we �find "important" patterns that evolve over time, so that we can easily summarize and visualize the graph? In real life, these patterns signify interaction between nodes over time -- for example, how the network traffic of a bank changes during the day, how calling patterns change season over season, or how people watch different genre of movies over different times of the year. Our work focuses on the problem of how to find and rank these patterns. To this end, we formalize this problem as minimizing the encoding cost in a data compression paradigm and propose Mango, an effective heuristic for discovering evolving patterns in dynamic graphs. We then apply our method to both synthetic and real datasets to show that Mango is able to summarize dynamic graphs by meaningful static and temporal patterns

CISPA – Helmholtz-Zentrum für Informationssicherheit

The Minimum Description Length Principle for Pattern Mining: A Survey

Author: Galbrun Esther
Publication venue
Publication date: 28/07/2021
Field of study

This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

arXiv.org e-Print Archive

On the Nature and Types of Anomalies: A Review

Author: Foorthuis Ralph
Publication venue
Publication date: 27/12/2020
Field of study

Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is generally ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies, and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review comments will be appreciated. Improvements in version 2: Explicit mention of fifth anomaly dimension; Added section on explainable anomaly detection; Added section on variations on the anomaly concept; Various minor additions and improvement

arXiv.org e-Print Archive

Harnessing rare category trinity for complex data

Author: Zhou Dawei
Publication venue
Publication date: 01/12/2021
Field of study

In the era of big data, we are inundated with the sheer volume of data being collected from various domains. In contrast, it is often the rare occurrences that are crucially important to many high-impact domains with diverse data types. For example, in online transaction platforms, the percentage of fraudulent transactions might be small, but the resultant financial loss could be significant; in social networks, a novel topic is often neglected by the majority of users at the initial stage, but it could burst into an emerging trend afterward; in the Sloan Digital Sky Survey, the vast majority of sky images (e.g., known stars, comets, nebulae, etc.) are of no interest to the astronomers, while only 0.001% of the sky images lead to novel scientific discoveries; in the worldwide pandemics (e.g., SARS, MERS, COVID19, etc.), the primary cases might be limited, but the consequences could be catastrophic (e.g., mass mortality and economic recession). Therefore, studying such complex rare categories have profound significance and longstanding impact in many aspects of modern society, from preventing financial fraud to uncovering hot topics and trends, from supporting scientific research to forecasting pandemic and natural disasters. In this thesis, we propose a generic learning mechanism with trinity modules for complex rare category analysis: (M1) Rare Category Characterization - characterizing the rare patterns with a compact representation; (M2) Rare Category Explanation - interpreting the prediction results and providing relevant clues for the end-users; (M3) Rare Category Generation - producing synthetic rare category examples that resemble the real ones. The key philosophy of our mechanism lies in "all for one and one for all" - each module makes unique contributions to the whole mechanism and thus receives support from its companions. In particular, M1 serves as the de-novo step to discover rare category patterns on complex data; M2 provides a proper lens to the end-users to examine the outputs and understand the learning process; and M3 synthesizes real rare category examples for data augmentation to further improve M1 and M2. To enrich the learning mechanism, we develop principled theorems and solutions to characterize, understand, and synthesize rare categories on complex scenarios, ranging from static rare categories to time-evolving rare categories, from attributed data to graph-structured data, from homogeneous data to heterogeneous data, from low-order connectivity patterns to high-order connectivity patterns, etc. It is worthy of mentioning that we have also launched one of the first visual analytic systems for dynamic rare category analysis, which integrates our developed techniques and enables users to investigate complex rare categories in practice

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Effects of Impact Cratering on the Microbial Biosphere of the Deep Terrestrial Subsurface

Author: Gronstal Aaron Lee
Publication venue
Publication date: 01/01/2008
Field of study

The 2005 ICDP-USGS deep drilling of the Chesapeake Bay Impact Structure (CBIS) returned the first complete core through an impact structure. A strict set of contamination assessment measures were implemented during sample collection to ensure that materials could be confidently used in geobiology, molecular biology and microbiology studies. Through direct cell counting, culturing and molecular analysis, samples offered a unique opportunity to characterize the subsurface microbial community present at depth in an impact structure. This work outlines how subsurface habitats can recover after impacts, and how impacts act to generate new microenvironments where microorganisms can colonize. Geobiology studies revealed a pattern of microbial abundance that corresponds to lithological transitions within the crater structure. Three 'zones' of abundance were defined, with the first showing a steeper logarithmic decline in cell numbers than seen in other deep subsurface environments. This is followed by a zone of cell numbers below the detection limit of the methods used. Finally, the deepest section of the core shows an increase in cell numbers, indicating that recolonisation has occurred following the impact event. Culturing studies were consistent with the results of enumerations, with successful cultures retrieved from microbiological zones 1 and 3. The majority of cultures were acquired using heterotrophic media, although cultures were also returned with media for iron reducers, iron oxidizers, sulfate reducers and humic acid utilisers. Culturing studies and molecular studies showed that a diverse consortium of microorganisms is present in the deep subsurface of the CBIS. Finally, the ability of microorganisms to access nutrients and minerals from meteoritic material was analyzed. The results of these studies add to our knowledge of how impacts events can affect subsurface microbial habitats; both directly by kinetic disruption of the environment, and through the delivery of exogenous materials

Open Research Online (The Open University)

OpenGrey Repository

Sentiment Analysis for Social Media

Author: Iglesias Carlos A.
Moreno Antonio
Publication venue: 'MDPI AG'
Publication date: 09/06/2020
Field of study

Sentiment analysis is a branch of natural language processing concerned with the study of the intensity of the emotions expressed in a piece of text. The automated analysis of the multitude of messages delivered through social media is one of the hottest research fields, both in academy and in industry, due to its extremely high potential applicability in many different domains. This Special Issue describes both technological contributions to the field, mostly based on deep learning techniques, and specific applications in areas like health insurance, gender classification, recommender systems, and cyber aggression detection

Directory of Open Access Books (DOAB)