351 research outputs found
Generating realistic scaled complex networks
Research on generative models is a central project in the emerging field of
network science, and it studies how statistical patterns found in real networks
could be generated by formal rules. Output from these generative models is then
the basis for designing and evaluating computational methods on networks, and
for verification and simulation studies. During the last two decades, a variety
of models has been proposed with an ultimate goal of achieving comprehensive
realism for the generated networks. In this study, we (a) introduce a new
generator, termed ReCoN; (b) explore how ReCoN and some existing models can be
fitted to an original network to produce a structurally similar replica, (c)
use ReCoN to produce networks much larger than the original exemplar, and
finally (d) discuss open problems and promising research directions. In a
comparative experimental study, we find that ReCoN is often superior to many
other state-of-the-art network generation methods. We argue that ReCoN is a
scalable and effective tool for modeling a given network while preserving
important properties at both micro- and macroscopic scales, and for scaling the
exemplar data by orders of magnitude in size.Comment: 26 pages, 13 figures, extended version, a preliminary version of the
paper was presented at the 5th International Workshop on Complex Networks and
their Application
Multi-Source Data Fusion for Cyberattack Detection in Power Systems
Cyberattacks can cause a severe impact on power systems unless detected
early. However, accurate and timely detection in critical infrastructure
systems presents challenges, e.g., due to zero-day vulnerability exploitations
and the cyber-physical nature of the system coupled with the need for high
reliability and resilience of the physical system. Conventional rule-based and
anomaly-based intrusion detection system (IDS) tools are insufficient for
detecting zero-day cyber intrusions in the industrial control system (ICS)
networks. Hence, in this work, we show that fusing information from multiple
data sources can help identify cyber-induced incidents and reduce false
positives. Specifically, we present how to recognize and address the barriers
that can prevent the accurate use of multiple data sources for fusion-based
detection. We perform multi-source data fusion for training IDS in a
cyber-physical power system testbed where we collect cyber and physical side
data from multiple sensors emulating real-world data sources that would be
found in a utility and synthesizes these into features for algorithms to detect
intrusions. Results are presented using the proposed data fusion application to
infer False Data and Command injection-based Man-in- The-Middle (MiTM) attacks.
Post collection, the data fusion application uses time-synchronized merge and
extracts features followed by pre-processing such as imputation and encoding
before training supervised, semi-supervised, and unsupervised learning models
to evaluate the performance of the IDS. A major finding is the improvement of
detection accuracy by fusion of features from cyber, security, and physical
domains. Additionally, we observed the co-training technique performs at par
with supervised learning methods when fed with our features
A Clustering Algorithm Based on an Ensemble of Dissimilarities: An Application in the Bioinformatics Domain
Clustering algorithms such as k-means depend heavily on choosing an appropriate distance metric that reflect accurately the object proximities. A wide range of dissimilarities may be defined that often lead to different clustering results. Choosing the best dissimilarity is an ill-posed problem and learning a general distance from the data is a complex task, particularly for high dimensional problems. Therefore, an appealing approach is to learn an ensemble of dissimilarities. In this paper, we have developed a semi-supervised clustering algorithm that learns a linear combination of dissimilarities considering incomplete knowledge in the form of pairwise constraints. The minimization of the loss function is based on a robust and efficient quadratic optimization algorithm. Besides, a regularization term is considered that controls the complexity of the distance metric learned avoiding overfitting. The algorithm has been applied to the identification of tumor samples using the gene expression profiles, where domain experts provide often incomplete knowledge in the form of pairwise constraints. We report that the algorithm proposed outperforms a standard semi-supervised clustering technique available in the literature and clustering results based on a single dissimilarity. The improvement is particularly relevant for applications with high level of noise
Implementing machine learning for data breach detection
Privata. ai is a User and Entity Behavior Analytics (UEBA) application used for the detection of data breaches in an organization. By tracking down the usual access to personal and sensitive data, it becomes much easier to detect an outlier. These anomalies could result in a real threat to the company’s data security and must, therefore, be promptly detected and addressed. This paper focuses on the managerial challenges that arise from the increasing threat of data breaches and how machine learning could help in protecting organizations from them. For this purpose, large part of the challenge came from understanding the unique specificities of these attacks and finding an appropriate machine learning method to detect them. Given the fact that the data used to train the models was randomly generated, the results should be taken with caution. Nevertheless, the models used for this paper should be taken as a basis for the future development of the software
Spectrogram classification using dissimilarity space
In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs
- …