1,304 research outputs found
Low-Rank and Sparse Decomposition for Hyperspectral Image Enhancement and Clustering
In this dissertation, some new algorithms are developed for hyperspectral imaging analysis enhancement. Tensor data format is applied in hyperspectral dataset sparse and low-rank decomposition, which could enhance the classification and detection performance. And multi-view learning technique is applied in hyperspectral imaging clustering. Furthermore, kernel version of multi-view learning technique has been proposed, which could improve clustering performance. Most of low-rank and sparse decomposition algorithms are based on matrix data format for HSI analysis. As HSI contains high spectral dimensions, tensor based extended low-rank and sparse decomposition (TELRSD) is proposed in this dissertation for better performance of HSI classification with low-rank tensor part, and HSI detection with sparse tensor part. With this tensor based method, HSI is processed in 3D data format, and information between spectral bands and pixels maintain integrated during decomposition process. This proposed algorithm is compared with other state-of-art methods. And the experiment results show that TELRSD has the best performance among all those comparison algorithms. HSI clustering is an unsupervised task, which aims to group pixels into different groups without labeled information. Low-rank sparse subspace clustering (LRSSC) is the most popular algorithms for this clustering task. The spatial-spectral based multi-view low-rank sparse subspace clustering (SSMLC) algorithms is proposed in this dissertation, which extended LRSSC with multi-view learning technique. In this algorithm, spectral and spatial views are created to generate multi-view dataset of HSI, where spectral partition, morphological component analysis (MCA) and principle component analysis (PCA) are applied to create others views. Furthermore, kernel version of SSMLC (k-SSMLC) also has been investigated. The performance of SSMLC and k-SSMLC are compared with sparse subspace clustering (SSC), low-rank sparse subspace clustering (LRSSC), and spectral-spatial sparse subspace clustering (S4C). It has shown that SSMLC could improve the performance of LRSSC, and k-SSMLC has the best performance. The spectral clustering has been proved that it equivalent to non-negative matrix factorization (NMF) problem. In this case, NMF could be applied to the clustering problem. In order to include local and nonlinear features in data source, orthogonal NMF (ONMF), graph-regularized NMF (GNMF) and kernel NMF (k-NMF) has been proposed for better clustering performance. The non-linear orthogonal graph NMF combine both kernel, orthogonal and graph constraints in NMF (k-OGNMF), which push up the clustering performance further. In the HSI domain, kernel multi-view based orthogonal graph NMF (k-MOGNMF) is applied for subspace clustering, where k-OGNMF is extended with multi-view algorithm, and it has better performance and computation efficiency
Bayesian Robust Tensor Factorization for Incomplete Multiway Data
We propose a generative model for robust tensor factorization in the presence
of both missing data and outliers. The objective is to explicitly infer the
underlying low-CP-rank tensor capturing the global information and a sparse
tensor capturing the local information (also considered as outliers), thus
providing the robust predictive distribution over missing entries. The
low-CP-rank tensor is modeled by multilinear interactions between multiple
latent factors on which the column sparsity is enforced by a hierarchical
prior, while the sparse tensor is modeled by a hierarchical view of Student-
distribution that associates an individual hyperparameter with each element
independently. For model learning, we develop an efficient closed-form
variational inference under a fully Bayesian treatment, which can effectively
prevent the overfitting problem and scales linearly with data size. In contrast
to existing related works, our method can perform model selection automatically
and implicitly without need of tuning parameters. More specifically, it can
discover the groundtruth of CP rank and automatically adapt the sparsity
inducing priors to various types of outliers. In addition, the tradeoff between
the low-rank approximation and the sparse representation can be optimized in
the sense of maximum model evidence. The extensive experiments and comparisons
with many state-of-the-art algorithms on both synthetic and real-world datasets
demonstrate the superiorities of our method from several perspectives.Comment: in IEEE Transactions on Neural Networks and Learning Systems, 201
On the subspace learning for network attack detection
Tese (doutorado)—Universidade de BrasÃlia, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2019.O custo com todos os tipos de ciberataques tem crescido nas organizações. A casa branca do
goveno norte americano estima que atividades cibernéticas maliciosas custaram em 2016 um
valor entre US109 bilhões para a economia norte americana. Recentemente, é
possÃvel observar um crescimento no número de ataques de negação de serviço, botnets,
invasões e ransomware.
A Accenture argumenta que 89% dos entrevistados em uma pesquisa acreditam que tecnologias
como inteligência artificial, aprendizagem de máquina e análise baseada em comportamentos,
são essenciais para a segurança das organizações. É possÃvel adotar abordagens semisupervisionada e não-supervisionadas para implementar análises baseadas em
comportamentos, que podem ser aplicadas na detecção de anomalias em tráfego de rede, sem a
ncessidade de dados de ataques para treinamento.
Esquemas de processamento de sinais têm sido aplicados na detecção de tráfegos maliciosos
em redes de computadores, através de abordagens não-supervisionadas que mostram ganhos
na detecção de ataques de rede e na detecção e anomalias.
A detecção de anomalias pode ser desafiadora em cenários de dados desbalanceados, que são
casos com raras ocorrências de anomalias em comparação com o número de eventos normais.
O desbalanceamento entre classes pode comprometer o desempenho de algoritmos traficionais
de classificação, através de um viés para a classe predominante, motivando o desenvolvimento
de algoritmos para detecção de anomalias em dados desbalanceados.
Alguns algoritmos amplamente utilizados na detecção de anomalias assumem que observações
legÃtimas seguem uma distribuição Gaussiana. Entretanto, esta suposição pode não ser
observada na análise de tráfego de rede, que tem suas variáveis usualmente caracterizadas por
distribuições assimétricas
ou de cauda pesada. Desta forma, algoritmos de detecção de anomalias têm atraÃdo pesquisas
para se tornarem mais discriminativos em distribuições assimétricas, como também para se
tornarem mais robustos à corrupção e capazes de lidar com problemas causados pelo
desbalanceamento de dados.
Como uma primeira contribuição, foi proposta a Autosimilaridade (Eigensimilarity em inglês), que
é uma abordagem baseada em conceitos de processamento de sinais com o objetivo de detectar
tráfego malicioso em redes de computadores. Foi avaliada a acurácia e o desempenho da
abordagem proposta através de cenários simulados e dos dados do DARPA 1998. Os
experimentos mostram que Autosimilaridade detecta os ataques synflood, fraggle e varredura de
portas com precisão, com detalhes e de uma forma automática e cega, i.e. em uma abordagem
não-supervisionada.
Considerando que a assimetria de distribuições de dados podem melhorar a detecção de
anomalias em dados desbalanceados e assimétricos, como no caso de tráfego de rede, foi
proposta a Análise Robusta de Componentes Principais baseada em Momentos (ARCP-m), que
é uma abordagem baseada em distâncias entre observações contaminadas e momentos
calculados a partir subespaços robustos aprendidos através da Análise Robusta de
Componentes Principais (ARCP), com o objetivo de detectar anomalias em dados assimétricos e
em tráfego de rede.
Foi avaliada a acurácia do ARCP-m para detecção de anomalias em dados simulados, com
distribuições assimétricas e de cauda pesada, como também para os dados do CTU-13. Os
experimentos comparam nossa proposta com algoritmos amplamente utilizados para detecção
de anomalias e mostra que a distância entre estimativas robustas e observações contaminadas
pode melhorar a detecção de anomalias em dados assimétricos e a detecção de ataques de
rede.
Adicionalmente, foi proposta uma arquitetura e abordagem para avaliar uma prova de conceito
da Autosimilaridade para a detecção de comportamentos maliciosos em aplicações móveis
corporativas. Neste sentido, foram propostos cenários, variáveis e abordagem para a análise de
ameaças, como também foi avaliado o tempo de processamento necessário para a execução do
Autosimilaridade em dispositivos móveis.The cost of all types of cyberattacks is increasing for global organizations. The Whitehouse of the
U.S. government estimates that malicious cyber activity cost the U.S. economy between US109 billion in 2016. Recently, it is possible to observe an increasing in numbers of
Denial of Service (DoS), botnets, malicious insider and ransomware attacks.
Accenture consulting argues that 89% of survey respondents believe breakthrough technologies,
like artificial intelligence, machine learning and user behavior analytics, are essential for securing
their organizations. To face adversarial models, novel network attacks and counter measures of
attackers to avoid detection, it is possible to adopt unsupervised or semi-supervised approaches
for network anomaly detection, by means of behavioral analysis, where known anomalies are not
necessaries for training models.
Signal processing schemes have been applied to detect malicious traffic in computer networks
through unsupervised approaches, showing advances in network traffic analysis, in network
attack detection, and in network intrusion detection systems.
Anomalies can be hard to identify and separate from normal data due to the rare occurrences of
anomalies in comparison to normal events. The imbalanced data can compromise the
performance of most standard learning algorithms, creating bias or unfair weight to learn from the
majority class and reducing detection capacity of anomalies that are characterized by the minority
class. Therefore, anomaly detection algorithms have to be highly discriminating, robust to
corruption and able to deal with the imbalanced data problem.
Some widely adopted algorithms for anomaly detection assume a Gaussian distributed data for
legitimate observations, however this assumption may not be observed in network traffic, which is
usually characterized by skewed and heavy-tailed distributions.
As a first important contribution, we propose the Eigensimilarity, which is an approach based on
signal processing concepts applied to detection of malicious traffic in computer networks. We
evaluate the accuracy and performance of the proposed framework applied to a simulated
scenario and to the DARPA 1998 data set. The performed experiments show that synflood,
fraggle and port scan attacks can be detected accurately by Eigensimilarity and with great detail,
in an automatic and blind fashion, i.e. in an unsupervised approach.
Considering that the skewness improves anomaly detection in imbalanced and skewed data,
such as network traffic, we propose the Moment-based Robust Principal Component Analysis (mRPCA) for network attack detection. The m-RPCA is a framework based on distances between
contaminated observations and moments computed from a robust subspace learned by Robust
Principal Component Analysis (RPCA), in order to detect anomalies from skewed data and
network traffic. We evaluate the accuracy of the m-RPCA for anomaly detection on simulated
data sets, with skewed and heavy-tailed distributions, and for the CTU-13 data set. The
Experimental evaluation compares our proposal to widely adopted algorithms for anomaly
detection and shows that the distance between robust estimates and contaminated observations
can improve the anomaly detection on skewed data and the network attack detection.
Moreover, we propose an architecture and approach to evaluate a proof of concept of
Eigensimilarity for malicious behavior detection on mobile applications, in order to detect possible
threats in offline corporate mobile client. We propose scenarios, features and approaches for
threat analysis by means of Eigensimilarity, and evaluate the processing time required for
Eigensimilarity execution in mobile devices
- …