1,456 research outputs found
An efficient hybrid system for anomaly detection in social networks
Anomaly detection has been an essential and dynamic research area in the data mining. A wide range of applications including different social medias have adopted different state-of-the-art methods to identify anomaly for ensuring user’s security and privacy. The social network refers to a forum used by different groups of people to express their thoughts, communicate with each other, and share the content needed. This social networks also facilitate abnormal activities, spread fake news, rumours, misinformation, unsolicited messages, and propaganda post malicious links. Therefore, detection of abnormalities is one of the important data analysis activities for the identification of normal or abnormal users on the social networks. In this paper, we have developed a hybrid anomaly detection method named DT-SVMNB that cascades several machine learning algorithms including decision tree (C5.0), Support Vector Machine (SVM) and Naïve Bayesian classifier (NBC) for classifying normal and abnormal users in social networks. We have extracted a list of unique features derived from users’ profile and contents. Using two kinds of dataset with the selected features, the proposed machine learning model called DT-SVMNB is trained. Our model classifies users as depressed one or suicidal one in the social network. We have conducted an experiment of our model using synthetic and real datasets from social network. The performance analysis demonstrates around 98% accuracy which proves the effectiveness and efficiency of our proposed system. © 2021, The Author(s)
An Overview on Application of Machine Learning Techniques in Optical Networks
Today's telecommunication networks have become sources of enormous amounts of
widely heterogeneous data. This information can be retrieved from network
traffic traces, network alarms, signal quality indicators, users' behavioral
data, etc. Advanced mathematical tools are required to extract meaningful
information from these data and take decisions pertaining to the proper
functioning of the networks from the network-generated data. Among these
mathematical tools, Machine Learning (ML) is regarded as one of the most
promising methodological approaches to perform network-data analysis and enable
automated network self-configuration and fault management. The adoption of ML
techniques in the field of optical communication networks is motivated by the
unprecedented growth of network complexity faced by optical networks in the
last few years. Such complexity increase is due to the introduction of a huge
number of adjustable and interdependent system parameters (e.g., routing
configurations, modulation format, symbol rate, coding schemes, etc.) that are
enabled by the usage of coherent transmission/reception technologies, advanced
digital signal processing and compensation of nonlinear effects in optical
fiber propagation. In this paper we provide an overview of the application of
ML to optical communications and networking. We classify and survey relevant
literature dealing with the topic, and we also provide an introductory tutorial
on ML for researchers and practitioners interested in this field. Although a
good number of research papers have recently appeared, the application of ML to
optical networks is still in its infancy: to stimulate further work in this
area, we conclude the paper proposing new possible research directions
Understanding Behavioral Drivers in Twitter Social Media Networks
As social media platforms facilitate user interactions, organizations increasingly use social media networks (SMNs) to build network ties. Studying user behavior on SMNs can help to uncover strategic information and improve situation awareness. However, there is a lack of understanding of behavioral drivers of SMN participants. This research developed a theoretically-based IS development framework for modeling user behavior in large evolving SMNs. To demonstrate the feasibility of our framework, we developed a proof-of-concept system for simulating user activities in the SMNs of Twitter social communities. Our system models the complex behavioral features in the SMNs by using a wide range of theoretically-driven features and machine-discovered features, and predicts user activities by using a pipeline of statistical and machine-learning techniques. Preliminary results of a simulation study provide insights of the importance of comprehensive network features to model SMN group behavior accurately and quality of commitment features to model SMN user behavior
Graph Priors, Optimal Transport, and Deep Learning in Biomedical Discovery
Recent advances in biomedical data collection allows the collection of massive datasets measuring thousands of features in thousands to millions of individual cells. This data has the potential to advance our understanding of biological mechanisms at a previously impossible resolution. However, there are few methods to understand data of this scale and type. While neural networks have made tremendous progress on supervised learning problems, there is still much work to be done in making them useful for discovery in data with more difficult to represent supervision. The flexibility and expressiveness of neural networks is sometimes a hindrance in these less supervised domains, as is the case when extracting knowledge from biomedical data. One type of prior knowledge that is more common in biological data comes in the form of geometric constraints. In this thesis, we aim to leverage this geometric knowledge to create scalable and interpretable models to understand this data. Encoding geometric priors into neural network and graph models allows us to characterize the models’ solutions as they relate to the fields of graph signal processing and optimal transport. These links allow us to understand and interpret this datatype. We divide this work into three sections. The first borrows concepts from graph signal processing to construct more interpretable and performant neural networks by constraining and structuring the architecture. The second borrows from the theory of optimal transport to perform anomaly detection and trajectory inference efficiently and with theoretical guarantees. The third examines how to compare distributions over an underlying manifold, which can be used to understand how different perturbations or conditions relate. For this we design an efficient approximation of optimal transport based on diffusion over a joint cell graph. Together, these works utilize our prior understanding of the data geometry to create more useful models of the data. We apply these methods to molecular graphs, images, single-cell sequencing, and health record data
Automatic human behaviour anomaly detection in surveillance video
This thesis work focusses upon developing the capability to automatically evaluate
and detect anomalies in human behaviour from surveillance video. We work with
static monocular cameras in crowded urban surveillance scenarios, particularly air-
ports and commercial shopping areas. Typically a person is 100 to 200 pixels high
in a scene ranging from 10 - 20 meters width and depth, populated by 5 to 40 peo-
ple at any given time. Our procedure evaluates human behaviour unobtrusively to
determine outlying behavioural events,
agging abnormal events to the operator.
In order to achieve automatic human behaviour anomaly detection we address
the challenge of interpreting behaviour within the context of the social and physical
environment. We develop and evaluate a process for measuring social connectivity
between individuals in a scene using motion and visual attention features. To do this
we use mutual information and Euclidean distance to build a social similarity matrix
which encodes the social connection strength between any two individuals. We de-
velop a second contextual basis which acts by segmenting a surveillance environment
into behaviourally homogeneous subregions which represent high tra c slow regions
and queuing areas. We model the heterogeneous scene in homogeneous subgroups
using both contextual elements. We bring the social contextual information, the
scene context, the motion, and visual attention features together to demonstrate
a novel human behaviour anomaly detection process which nds outlier behaviour
from a short sequence of video. The method, Nearest Neighbour Ranked Outlier
Clusters (NN-RCO), is based upon modelling behaviour as a time independent se-
quence of behaviour events, can be trained in advance or set upon a single sequence.
We nd that in a crowded scene the application of Mutual Information-based social
context permits the ability to prevent self-justifying groups and propagate anomalies
in a social network, granting a greater anomaly detection capability. Scene context
uniformly improves the detection of anomalies in all the datasets we test upon.
We additionally demonstrate that our work is applicable to other data domains.
We demonstrate upon the Automatic Identi cation Signal data in the maritime
domain. Our work is capable of identifying abnormal shipping behaviour using joint
motion dependency as analogous for social connectivity, and similarly segmenting
the shipping environment into homogeneous regions
Dynamics of Social Networks: Multi-agent Information Fusion, Anticipatory Decision Making and Polling
This paper surveys mathematical models, structural results and algorithms in
controlled sensing with social learning in social networks.
Part 1, namely Bayesian Social Learning with Controlled Sensing addresses the
following questions: How does risk averse behavior in social learning affect
quickest change detection? How can information fusion be priced? How is the
convergence rate of state estimation affected by social learning? The aim is to
develop and extend structural results in stochastic control and Bayesian
estimation to answer these questions. Such structural results yield fundamental
bounds on the optimal performance, give insight into what parameters affect the
optimal policies, and yield computationally efficient algorithms.
Part 2, namely, Multi-agent Information Fusion with Behavioral Economics
Constraints generalizes Part 1. The agents exhibit sophisticated decision
making in a behavioral economics sense; namely the agents make anticipatory
decisions (thus the decision strategies are time inconsistent and interpreted
as subgame Bayesian Nash equilibria).
Part 3, namely {\em Interactive Sensing in Large Networks}, addresses the
following questions: How to track the degree distribution of an infinite random
graph with dynamics (via a stochastic approximation on a Hilbert space)? How
can the infected degree distribution of a Markov modulated power law network
and its mean field dynamics be tracked via Bayesian filtering given incomplete
information obtained by sampling the network? We also briefly discuss how the
glass ceiling effect emerges in social networks.
Part 4, namely \emph{Efficient Network Polling} deals with polling in large
scale social networks. In such networks, only a fraction of nodes can be polled
to determine their decisions. Which nodes should be polled to achieve a
statistically accurate estimates
Advances in Streaming Novelty Detection
153 p.En primer lugar, en esta tesis se aborda un problema de confusión entre términos y problemas en el cual el mismo término es utilizado para referirse a diferentes problemas y, de manera similar, el mismo problema es llamado con diferentes términos indistintamente. Esto motiva una dificultad de avance en elcampo de conocimiento dado que es difÃcil encontrar literatura relacionada y propicia la repetición detrabajos. En la primera contribución se propone una asignación individual de términos a problemas y una formalización de los escenarios de aprendizaje para tratar de estandarizar el campo. En segundo lugar, se aborda el problema de Streaming Novelty Detection. En este problema, partiendo de un conjunto de datos supervisado, se aprende un modelo. A continuación, el modelo recibe nuevas instancias no etiquetadas para predecir su clase de manera online o en stream. El modelo debe actualizarse para hacer frente al concept-drift. En este escenario de clasificación, se asume que puedensurgir nuevas clases de forma dinámica. Por lo tanto, el modelo debe ser capaz de descubrir nuevas clases de manera automática y sin supervisión. En este contexto, esta tesis propone 2 contribuciones. En primerlugar una solución basada en mixturas de Guassianas donde cada clase en modelada con una de lascomponentes de la mixtura. En segundo lugar, se propone el uso de redes neuronales, tales como las redes Autoencoder, y las redes Deep Support Vector Data Description para trabajar con serie stemporales
Application of information theory and statistical learning to anomaly detection
In today\u27s highly networked world, computer intrusions and other attacks area constant threat. The detection of such attacks, especially attacks that are new or previously unknown, is important to secure networks and computers. A major focus of current research efforts in this area is on anomaly detection.;In this dissertation, we explore applications of information theory and statistical learning to anomaly detection. Specifically, we look at two difficult detection problems in network and system security, (1) detecting covert channels, and (2) determining if a user is a human or bot. We link both of these problems to entropy, a measure of randomness information content, or complexity, a concept that is central to information theory. The behavior of bots is low in entropy when tasks are rigidly repeated or high in entropy when behavior is pseudo-random. In contrast, human behavior is complex and medium in entropy. Similarly, covert channels either create regularity, resulting in low entropy, or encode extra information, resulting in high entropy. Meanwhile, legitimate traffic is characterized by complex interdependencies and moderate entropy. In addition, we utilize statistical learning algorithms, Bayesian learning, neural networks, and maximum likelihood estimation, in both modeling and detecting of covert channels and bots.;Our results using entropy and statistical learning techniques are excellent. By using entropy to detect covert channels, we detected three different covert timing channels that were not detected by previous detection methods. Then, using entropy and Bayesian learning to detect chat bots, we detected 100% of chat bots with a false positive rate of only 0.05% in over 1400 hours of chat traces. Lastly, using neural networks and the idea of human observational proofs to detect game bots, we detected 99.8% of game bots with no false positives in 95 hours of traces. Our work shows that a combination of entropy measures and statistical learning algorithms is a powerful and highly effective tool for anomaly detection
- …