22 research outputs found
Deteção de atividades ilícitas de software Bots através do DNS
DNS is a critical component of the Internet where almost all Internet applications
and organizations rely on. Its shutdown can deprive them from being part of the
Internet, and hence, DNS is usually the only protocol to be allowed when Internet
access is firewalled. The constant exposure of this protocol to external entities force
corporations to always be observant of external rogue software that may misuse
the DNS to establish covert channels and perform multiple illicit activities, such as
command and control and data exfiltration.
Most current solutions for bot malware and botnet detection are based on Deep
Packet Inspection techniques, such as analyzing DNS query payloads, which may
reveal private and sensitive information. In addiction, the majority of existing solutions
do not consider the usage of licit and encrypted DNS traffic, where Deep
Packet Inspection techniques are impossible to be used.
This dissertation proposes mechanisms to detect malware bots and botnet behaviors
on DNS traffic that are robust to encrypted DNS traffic and that ensure the
privacy of the involved entities by analyzing instead the behavioral patterns of DNS
communications using descriptive statistics over collected network metrics such as
packet rates, packet lengths, and silence and activity periods. After characterizing
DNS traffic behaviors, a study of the processed data is conducted, followed by the
training of Novelty Detection algorithms with the processed data.
Models are trained with licit data gathered from multiple licit activities, such as
reading the news, studying, and using social networks, in multiple operating systems,
browsers, and configurations. Then, the models were tested with similar
data, but containing bot malware traffic. Our tests show that our best performing
models achieve detection rates in the order of 99%, and 92% for malware bots
using low throughput rates.
This work ends with some ideas for a more realistic generation of bot malware
traffic, as the current DNS Tunneling tools are limited when mimicking licit DNS
usages, and for a better detection of malware bots that use low throughput rates.O DNS é um componente crítico da Internet, já que quase todas as aplicações
e organizações que a usam dependem dele para funcionar. A sua privação pode
deixá-las de fazerem parte da Internet, e por causa disso, o DNS é normalmente
o único protocolo permitido quando o acesso à Internet está restrito. A exposição
constante deste protocolo a entidades externas obrigam corporações a estarem
sempre atentas a software externo ilícito que pode fazer uso indevido do DNS para
estabelecer canais secretos e realizar várias atividades ilícitas, como comando e
controlo e exfiltração de dados.
A maioria das soluções atuais para detecção de malware bots e de botnets são
baseadas em técnicas inspeção profunda de pacotes, como analizar payloads de
pedidos de DNS, que podem revelar informação privada e sensitiva. Além disso,
a maioria das soluções existentes não consideram o uso lícito e cifrado de tráfego
DNS, onde técnicas como inspeção profunda de pacotes são impossíveis de serem
usadas.
Esta dissertação propõe mecanismos para detectar comportamentos de malware
bots e botnets que usam o DNS, que são robustos ao tráfego DNS cifrado e
que garantem a privacidade das entidades envolvidas ao analizar, em vez disso,
os padrões comportamentais das comunicações DNS usando estatística descritiva
em métricas recolhidas na rede, como taxas de pacotes, o tamanho dos pacotes,
e os tempos de atividade e silêncio. Após a caracterização dos comportamentos
do tráfego DNS, um estudo sobre os dados processados é realizado, sendo depois
usados para treinar os modelos de Detecção de Novidades.
Os modelos são treinados com dados lícitos recolhidos de multiplas atividades
lícitas, como ler as notícias, estudar, e usar redes sociais, em multiplos sistemas
operativos e com multiplas configurações. De seguida, os modelos são testados
com dados lícitos semelhantes, mas contendo também tráfego de malware bots.
Os nossos testes mostram que com modelos de Detecção de Novidades é possível
obter taxas de detecção na ordem dos 99%, e de 98% para malware bots que geram
pouco tráfego.
Este trabalho finaliza com algumas ideas para uma geração de tráfego ilícito mais
realista, já que as ferramentas atuais de DNS tunneling são limitadas quando
usadas para imitar usos de DNS lícito, e para uma melhor deteção de situações
onde malware bots geram pouco tráfego.Mestrado em Engenharia de Computadores e Telemátic
End-to-end anomaly detection in stream data
Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health
Advances in Data Mining Knowledge Discovery and Applications
Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications
Metric and Representation Learning
All data has some inherent mathematical structure. I am interested in understanding the intrinsic geometric and probabilistic structure of data to design effective algorithms and tools that can be applied to machine learning and across all branches of science.
The focus of this thesis is to increase the effectiveness of machine learning techniques by developing a mathematical and algorithmic framework using which, given any type of data, we can learn an optimal representation. Representation learning is done for many reasons. It could be done to fix the corruption given corrupted data or to learn a low dimensional or simpler representation, given high dimensional data or a very complex representation of the data. It could also be that the current representation of the data does not capture the important geometric features of the data.
One of the many challenges in representation learning is determining ways to judge the quality of the representation learned. In many cases, the consensus is that if d is the natural metric on the representation, then this metric should provide meaningful information about the data. Many examples of this can be seen in areas such as metric learning, manifold learning, and graph embedding. However, most algorithms that solve these problems learn a representation in a metric space first and then extract a metric.
A large part of my research is exploring what happens if the order is switched, that is, learn the appropriate metric first and the embedding later. The philosophy behind this approach is that understanding the inherent geometry of the data is the most crucial part of representation learning. Often, studying the properties of the appropriate metric on the input data sets indicates the type of space, we should be seeking for the representation. Hence giving us more robust representations. Optimizing for the appropriate metric can also help overcome issues such as missing and noisy data. My projects fall into three different areas of representation learning.
1) Geometric and probabilistic analysis of representation learning methods.
2) Developing methods to learn optimal metrics on large datasets.
3) Applications.
For the category of geometric and probabilistic analysis of representation learning methods, we have three projects. First, designing optimal training data for denoising autoencoders. Second, formulating a new optimal transport problem and understanding the geometric structure. Third, analyzing the robustness to perturbations of the solutions obtained from the classical multidimensional scaling algorithm versus that of the true solutions to the multidimensional scaling problem.
For learning optimal metric, we are given a dissimilarity matrix , some function and some a subset of the space of all metrics and we want to find that minimizes . In this thesis, we consider the version of the problem when is the space of metrics defined on a fixed graph. That is, given a graph , we let , be the space of all metrics defined via . For this , we consider the sparse objective function as well as convex objective functions. We also looked at the problem where we want to learn a tree. We also show how the ideas behind learning the optimal metric can be applied to dimensionality reduction in the presence of missing data.
Finally, we look at an application to real world data. Specifically trying to reconstruct ancient Greek text.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169738/1/rsonthal_1.pd
Timely Classification of Encrypted or ProtocolObfuscated Internet Traffic Using Statistical Methods
Internet traffic classification aims to identify the type of application or protocol that generated
a particular packet or stream of packets on the network. Through traffic classification,
Internet Service Providers (ISPs), governments, and network administrators can
access basic functions and several solutions, including network management, advanced
network monitoring, network auditing, and anomaly detection. Traffic classification is
essential as it ensures the Quality of Service (QoS) of the network, as well as allowing
efficient resource planning.
With the increase of encrypted or obfuscated protocol traffic on the Internet and multilayer
data encapsulation, some classical classification methods have lost interest from the
scientific community. The limitations of traditional classification methods based on port
numbers and payload inspection to classify encrypted or obfuscated Internet traffic have
led to significant research efforts focused on Machine Learning (ML) based classification
approaches using statistical features from the transport layer. In an attempt to increase
classification performance, Machine Learning strategies have gained interest from the scientific
community and have shown promise in the future of traffic classification, specially
to recognize encrypted traffic.
However, ML approach also has its own limitations, as some of these methods have a
high computational resource consumption, which limits their application when classifying
large traffic or realtime
flows. Limitations of ML application have led to the investigation
of alternative approaches, including featurebased
procedures and statistical methods. In
this sense, statistical analysis methods, such as distances and divergences, have been used
to classify traffic in large flows and in realtime.
The main objective of statistical distance is to differentiate flows and find a pattern in
traffic characteristics through statistical properties, which enable classification. Divergences
are functional expressions often related to information theory, which measure the
degree of discrepancy between any two distributions.
This thesis focuses on proposing a new methodological approach to classify encrypted
or obfuscated Internet traffic based on statistical methods that enable the evaluation of
network traffic classification performance, including the use of computational resources
in terms of CPU and memory. A set of traffic classifiers based on KullbackLeibler
and
JensenShannon
divergences, and Euclidean, Hellinger, Bhattacharyya, and Wootters distances
were proposed. The following are the four main contributions to the advancement
of scientific knowledge reported in this thesis.
First, an extensive literature review on the classification of encrypted and obfuscated Internet traffic was conducted. The results suggest that portbased
and payloadbased
methods are becoming obsolete due to the increasing use of traffic encryption and multilayer
data encapsulation. MLbased
methods are also becoming limited due to their computational
complexity. As an alternative, Support Vector Machine (SVM), which is also
an ML method, and the KolmogorovSmirnov
and Chisquared
tests can be used as reference
for statistical classification. In parallel, the possibility of using statistical methods
for Internet traffic classification has emerged in the literature, with the potential of good
results in classification without the need of large computational resources. The potential
statistical methods are Euclidean Distance, Hellinger Distance, Bhattacharyya Distance,
Wootters Distance, as well as KullbackLeibler
(KL) and JensenShannon
divergences.
Second, we present a proposal and implementation of a classifier based on SVM for P2P
multimedia traffic, comparing the results with KolmogorovSmirnov
(KS) and Chisquare
tests. The results suggest that SVM classification with Linear kernel leads to a better classification
performance than KS and Chisquare
tests, depending on the value assigned to
the Self C parameter. The SVM method with Linear kernel and suitable values for the Self
C parameter may be a good choice to identify encrypted P2P multimedia traffic on the
Internet.
Third, we present a proposal and implementation of two classifiers based on KL Divergence
and Euclidean Distance, which are compared to SVM with Linear kernel, configured
with the standard Self C parameter, showing a reduced ability to classify flows based
solely on packet sizes compared to KL and Euclidean Distance methods. KL and Euclidean
methods were able to classify all tested applications, particularly streaming and P2P,
where for almost all cases they efficiently identified them with high accuracy, with reduced
consumption of computational resources. Based on the obtained results, it can be
concluded that KL and Euclidean Distance methods are an alternative to SVM, as these
statistical approaches can operate in realtime
and do not require retraining every time a
new type of traffic emerges.
Fourth, we present a proposal and implementation of a set of classifiers for encrypted
Internet traffic, based on JensenShannon
Divergence and Hellinger, Bhattacharyya, and
Wootters Distances, with their respective results compared to those obtained with methods
based on Euclidean Distance, KL, KS, and ChiSquare.
Additionally, we present a comparative
qualitative analysis of the tested methods based on Kappa values and Receiver
Operating Characteristic (ROC) curves. The results suggest average accuracy values above
90% for all statistical methods, classified as ”almost perfect reliability” in terms of Kappa
values, with the exception of KS. This result indicates that these methods are viable options
to classify encrypted Internet traffic, especially Hellinger Distance, which showed
the best Kappa values compared to other classifiers. We conclude that the considered
statistical methods can be accurate and costeffective
in terms of computational resource
consumption to classify network traffic. Our approach was based on the classification of Internet network traffic, focusing on statistical
distances and divergences. We have shown that it is possible to classify and obtain
good results with statistical methods, balancing classification performance and the
use of computational resources in terms of CPU and memory. The validation of the proposal
supports the argument of this thesis, which proposes the implementation of statistical
methods as a viable alternative to Internet traffic classification compared to methods
based on port numbers, payload inspection, and ML.A classificação de tráfego Internet visa identificar o tipo de aplicação ou protocolo que
gerou um determinado pacote ou fluxo de pacotes na rede. Através da classificação de
tráfego, Fornecedores de Serviços de Internet (ISP), governos e administradores de rede
podem ter acesso às funções básicas e várias soluções, incluindo gestão da rede, monitoramento
avançado de rede, auditoria de rede e deteção de anomalias. Classificar o tráfego é
essencial, pois assegura a Qualidade de Serviço (QoS) da rede, além de permitir planear
com eficiência o uso de recursos.
Com o aumento de tráfego cifrado ou protocolo ofuscado na Internet e do encapsulamento
de dados multicamadas, alguns métodos clássicos da classificação perderam interesse de
investigação da comunidade científica. As limitações dos métodos tradicionais da classificação
com base no número da porta e na inspeção de carga útil payload para classificar
o tráfego de Internet cifrado ou ofuscado levaram a esforços significativos de investigação
com foco em abordagens da classificação baseadas em técnicas de Aprendizagem
Automática (ML) usando recursos estatísticos da camada de transporte. Na tentativa
de aumentar o desempenho da classificação, as estratégias de Aprendizagem Automática
ganharam o interesse da comunidade científica e se mostraram promissoras no futuro da
classificação de tráfego, principalmente no reconhecimento de tráfego cifrado.
No entanto, a abordagem em ML também têm as suas próprias limitações,
pois alguns
desses métodos possuem um elevado consumo de recursos computacionais, o que limita
a sua aplicação para classificação de grandes fluxos de tráfego ou em tempo real. As limitações
no âmbito da aplicação de ML levaram à investigação de abordagens alternativas,
incluindo procedimentos baseados em características e métodos estatísticos. Neste sentido,
os métodos de análise estatística, tais como distâncias e divergências, têm sido utilizados
para classificar tráfego em grandes fluxos e em tempo real.
A distância estatística possui como objetivo principal diferenciar os fluxos e permite encontrar
um padrão nas características de tráfego através de propriedades estatísticas, que
possibilitam a classificação. As divergências são expressões funcionais frequentemente
relacionadas com a teoria da informação, que mede o grau de discrepância entre duas
distribuições quaisquer.
Esta tese focase
na proposta de uma nova abordagem metodológica para classificação de
tráfego cifrado ou ofuscado da Internet com base em métodos estatísticos que possibilite
avaliar o desempenho da classificação de tráfego de rede, incluindo a utilização de recursos
computacionais, em termos de CPU e memória. Foi proposto um conjunto de classificadores
de tráfego baseados nas Divergências de KullbackLeibler
e JensenShannon
e Distâncias Euclidiana, Hellinger, Bhattacharyya e Wootters. A seguir resumemse
os tese.
Primeiro, realizámos uma ampla revisão de literatura sobre classificação de tráfego cifrado
e ofuscado de Internet. Os resultados sugerem que os métodos baseados em porta e
baseados em carga útil estão se tornando obsoletos em função do crescimento da utilização
de cifragem de tráfego e encapsulamento de dados multicamada. O tipo de métodos
baseados em ML também está se tornando limitado em função da complexidade computacional.
Como alternativa, podese
utilizar a Máquina de Vetor de Suporte (SVM),
que também é um método de ML, e os testes de KolmogorovSmirnov
e Quiquadrado
como referência de comparação da classificação estatística. Em paralelo, surgiu na literatura
a possibilidade de utilização de métodos estatísticos para classificação de tráfego
de Internet, com potencial de bons resultados na classificação sem aporte de grandes recursos
computacionais. Os métodos estatísticos potenciais são as Distâncias Euclidiana,
Hellinger, Bhattacharyya e Wootters, além das Divergências de Kullback–Leibler (KL) e
JensenShannon.
Segundo, apresentamos uma proposta e implementação de um classificador baseado na
Máquina de Vetor de Suporte (SVM) para o tráfego multimédia P2P (PeertoPeer),
comparando
os resultados com os testes de KolmogorovSmirnov
(KS) e Quiquadrado.
Os
resultados sugerem que a classificação da SVM com kernel Linear conduz a um melhor
desempenho da classificação do que os testes KS e Quiquadrado,
dependente do valor
atribuído ao parâmetro Self C. O método SVM com kernel Linear e com valores adequados
para o parâmetro Self C pode ser uma boa escolha para identificar o tráfego Par a Par
(P2P) multimédia cifrado na Internet.
Terceiro, apresentamos uma proposta e implementação de dois classificadores baseados
na Divergência de KullbackLeibler (KL) e na Distância Euclidiana, sendo comparados
com a SVM com kernel Linear, configurado para o parâmestro Self C padrão, apresenta
reduzida
capacidade de classificar fluxos com base apenas nos tamanhos dos pacotes
em relação aos métodos KL e Distância Euclidiana. Os métodos KL e Euclidiano foram
capazes de classificar todas as aplicações testadas, destacandose
streaming e P2P, onde
para quase todos os casos foi eficiente identificálas
com alta precisão, com reduzido consumo
de recursos computacionais.Com base nos resultados obtidos, podese
concluir que
os métodos KL e Distância Euclidiana são uma alternativa à SVM, porque essas abordagens
estatísticas podem operar em tempo real e não precisam de retreinamento cada vez
que surge um novo tipo de tráfego.
Quarto, apresentamos uma proposta e implementação de um conjunto de classificadores
para o tráfego de Internet cifrado, baseados na Divergência de JensenShannon
e nas Distâncias
de Hellinger, Bhattacharyya e Wootters, sendo os respetivos resultados comparados
com os resultados obtidos com os métodos baseados na Distância Euclidiana, KL, KS e Quiquadrado.
Além disso, apresentamos uma análise qualitativa comparativa dos
métodos testados com base nos valores de Kappa e Curvas Característica de Operação do
Receptor (ROC). Os resultados sugerem valores médios de precisão acima de 90% para todos
os métodos estatísticos, classificados como “confiabilidade quase perfeita” em valores
de Kappa, com exceçãode KS. Esse resultado indica que esses métodos são opções viáveis
para a classificação de tráfego cifrado da Internet, em especial a Distância de Hellinger,
que apresentou os melhores resultados do valor de Kappa em comparaçãocom os demais
classificadores. Concluise
que os métodos estatísticos considerados podem ser precisos e
económicos em termos de consumo de recursos computacionais para classificar o tráfego
da rede.
A nossa abordagem baseouse
na classificação de tráfego de rede Internet, focando em
distâncias e divergências estatísticas. Nós mostramos que é possível classificar e obter
bons resultados com métodos estatísticos, equilibrando desempenho de classificação e
uso de recursos computacionais em termos de CPU e memória. A validação da proposta
sustenta o argumento desta tese, que propõe a implementação de métodos estatísticos
como alternativa viável à classificação de tráfego da Internet em relação aos métodos com
base no número da porta, na inspeção de carga útil e de ML.Thesis prepared at Instituto de Telecomunicações Delegação
da Covilhã and at the Department
of Computer Science of the University of Beira Interior, and submitted to the
University of Beira Interior for discussion in public session to obtain the Ph.D. Degree in
Computer Science and Engineering.
This work has been funded by Portuguese FCT/MCTES through national funds and, when
applicable, cofunded
by EU funds under the project UIDB/50008/2020, and by operation
Centro010145FEDER000019
C4
Centro
de Competências em Cloud Computing,
cofunded
by the European Regional Development Fund (ERDF/FEDER) through
the Programa Operacional Regional do Centro (Centro 2020). This work has also been
funded by CAPES (Brazilian Federal Agency for Support and Evaluation of Graduate Education)
within the Ministry of Education of Brazil under a scholarship supported by the
International Cooperation Program CAPES/COFECUB Project
9090134/
2013 at the
University of Beira Interior
Metodologias para caracterização de tráfego em redes de comunicações
Tese de doutoramento em Metodologias para caracterização de tráfego em redes de comunicaçõesInternet Tra c, Internet Applications, Internet Attacks, Tra c Pro ling,
Multi-Scale Analysis
abstract Nowadays, the Internet can be seen as an ever-changing platform where new
and di erent types of services and applications are constantly emerging. In
fact, many of the existing dominant applications, such as social networks,
have appeared recently, being rapidly adopted by the user community. All
these new applications required the implementation of novel communication
protocols that present di erent network requirements, according to the service
they deploy. All this diversity and novelty has lead to an increasing need
of accurately pro ling Internet users, by mapping their tra c to the originating
application, in order to improve many network management tasks such
as resources optimization, network performance, service personalization and
security. However, accurately mapping tra c to its originating application
is a di cult task due to the inherent complexity of existing network protocols
and to several restrictions that prevent the analysis of the contents of
the generated tra c. In fact, many technologies, such as tra c encryption,
are widely deployed to assure and protect the con dentiality and integrity
of communications over the Internet. On the other hand, many legal constraints
also forbid the analysis of the clients' tra c in order to protect
their con dentiality and privacy. Consequently, novel tra c discrimination
methodologies are necessary for an accurate tra c classi cation and user
pro ling. This thesis proposes several identi cation methodologies for an
accurate Internet tra c pro ling while coping with the di erent mentioned
restrictions and with the existing encryption techniques. By analyzing the
several frequency components present in the captured tra c and inferring
the presence of the di erent network and user related events, the proposed
approaches are able to create a pro le for each one of the analyzed Internet
applications. The use of several probabilistic models will allow the accurate
association of the analyzed tra c to the corresponding application. Several
enhancements will also be proposed in order to allow the identi cation of
hidden illicit patterns and the real-time classi cation of captured tra c.
In addition, a new network management paradigm for wired and wireless
networks will be proposed. The analysis of the layer 2 tra c metrics and
the di erent frequency components that are present in the captured tra c
allows an e cient user pro ling in terms of the used web-application. Finally,
some usage scenarios for these methodologies will be presented and
discussed
Smart Urban Water Networks
This book presents the paper form of the Special Issue (SI) on Smart Urban Water Networks. The number and topics of the papers in the SI confirm the growing interest of operators and researchers for the new paradigm of smart networks, as part of the more general smart city. The SI showed that digital information and communication technology (ICT), with the implementation of smart meters and other digital devices, can significantly improve the modelling and the management of urban water networks, contributing to a radical transformation of the traditional paradigm of water utilities. The paper collection in this SI includes different crucial topics such as the reliability, resilience, and performance of water networks, innovative demand management, and the novel challenge of real-time control and operation, along with their implications for cyber-security. The SI collected fourteen papers that provide a wide perspective of solutions, trends, and challenges in the contest of smart urban water networks. Some solutions have already been implemented in pilot sites (i.e., for water network partitioning, cyber-security, and water demand disaggregation and forecasting), while further investigations are required for other methods, e.g., the data-driven approaches for real time control. In all cases, a new deal between academia, industry, and governments must be embraced to start the new era of smart urban water systems
Integrated Pest Management of Field Crops
Consumers in the EU and beyond are increasingly concerned about the impact of pesticides on the environment and human health. In the context of EU phytosanitary and environmental policies, the common EU challenge is to reduce dependence on chemicals, improve food quality and increase the potential for developing more bio-based production systems. Therefore, novel control methods and new strategies that reduce the current dependence on insecticides need to be developed, applied and disseminated among stakeholders. As a cornerstone of sustainable agriculture, integrated pest management (IPM) aims to improve farmers' practices to achieve higher profits while improving environmental quality. Implementing the principles of IPM in agricultural production requires new and up-to-date knowledge generated by science and accepted by farmers. In this Special Issue, we focus on recent advances and methods for IPM in field crops. It contains eight original research articles and two review articles dealing with different aspects of IPM in some of the major field crops: Potato, Maize, Soybean, Sugar Beet, Barley, Rice, Eggplant and Quinoa as well as farmer education issues on IPM. The studies published refer to all the basic principles of IPM and give examples of their implementation in different crops and cropping systems. Research on various aspects of the implementation of IPM in crop production is a continuous need. The research presented helps to provide a mosaic picture with examples of how crop-specific, site-specific and knowledge-intensive IPM practices should be considered and translated into workable practices
Social informatics
5th International Conference, SocInfo 2013, Kyoto, Japan, November 25-27, 2013, Proceedings</p
Enhancing Computer Network Security through Improved Outlier Detection for Data Streams
V několika posledních letech se metody strojového učení (zvláště ty zabývající se detekcí odlehlých hodnot - OD) v oblasti kyberbezpečnosti opíraly o zjišťování anomálií síťového provozu spočívajících v nových schématech útoků. Detekce anomálií v počítačových sítích reálného světa se ale stala stále obtížnější kvůli trvalému nárůstu vysoce objemných, rychlých a dimenzionálních průběžně přicházejících dat (SD), pro která nejsou k dispozici obecně uznané a pravdivé informace o anomalitě. Účinná detekční schémata pro vestavěná síťová zařízení musejí být rychlá a paměťově nenáročná a musejí být schopna se potýkat se změnami konceptu, když se vyskytnou. Cílem této disertace je zlepšit bezpečnost počítačových sítí zesílenou detekcí odlehlých hodnot v datových proudech, obzvláště SD, a dosáhnout kyberodolnosti, která zahrnuje jak detekci a analýzu, tak reakci na bezpečnostní incidenty jako jsou např. nové zlovolné aktivity. Za tímto účelem jsou v práci navrženy čtyři hlavní příspěvky, jež byly publikovány nebo se nacházejí v recenzním řízení časopisů. Zaprvé, mezera ve volbě vlastností (FS) bez učitele pro zlepšování již hotových metod OD v datových tocích byla zaplněna navržením volby vlastností bez učitele pro detekci odlehlých průběžně přicházejících dat označované jako UFSSOD. Následně odvozujeme generický koncept, který ukazuje dva aplikační scénáře UFSSOD ve spojení s online algoritmy OD. Rozsáhlé experimenty ukázaly, že UFSSOD coby algoritmus schopný online zpracování vykazuje srovnatelné výsledky jako konkurenční metoda upravená pro OD. Zadruhé představujeme nový aplikační rámec nazvaný izolovaný les založený na počítání výkonu (PCB-iForest), jenž je obecně schopen využít jakoukoliv online OD metodu založenou na množinách dat tak, aby fungovala na SD. Do tohoto algoritmu integrujeme dvě varianty založené na klasickém izolovaném lese. Rozsáhlé experimenty provedené na 23 multidisciplinárních datových sadách týkajících se bezpečnostní problematiky reálného světa ukázaly, že PCB-iForest jasně překonává už zavedené konkurenční metody v 61 % případů a dokonce dosahuje ještě slibnějších výsledků co do vyváženosti mezi výpočetními náklady na klasifikaci a její úspěšností. Zatřetí zavádíme nový pracovní rámec nazvaný detekce odlehlých hodnot a rozpoznávání schémat útoku proudovým způsobem (SOAAPR), jenž je na rozdíl od současných metod schopen zpracovat výstup z různých online OD metod bez učitele proudovým způsobem, aby získal informace o nových schématech útoku. Ze seshlukované množiny korelovaných poplachů jsou metodou SOAAPR vypočítány tři různé soukromí zachovávající podpisy podobné otiskům prstů, které charakterizují a reprezentují potenciální scénáře útoku s ohledem na jejich komunikační vztahy, projevy ve vlastnostech dat a chování v čase. Evaluace na dvou oblíbených datových sadách odhalila, že SOAAPR může soupeřit s konkurenční offline metodou ve schopnosti korelace poplachů a významně ji překonává z hlediska výpočetního času . Navíc se všechny tři typy podpisů ve většině případů zdají spolehlivě charakterizovat scénáře útoků tím, že podobné seskupují k sobě. Začtvrté představujeme algoritmus nepárového kódu autentizace zpráv (Uncoupled MAC), který propojuje oblasti kryptografického zabezpečení a detekce vniknutí (IDS) pro síťovou bezpečnost. Zabezpečuje síťovou komunikaci (autenticitu a integritu) kryptografickým schématem s podporou druhé vrstvy kódy autentizace zpráv, ale také jako vedlejší efekt poskytuje funkcionalitu IDS tak, že vyvolává poplach na základě porušení hodnot nepárového MACu. Díky novému samoregulačnímu rozšíření algoritmus adaptuje svoje vzorkovací parametry na základě zjištění škodlivých aktivit. Evaluace ve virtuálním prostředí jasně ukazuje, že schopnost detekce se za běhu zvyšuje pro různé scénáře útoku. Ty zahrnují dokonce i situace, kdy se inteligentní útočníci snaží využít slabá místa vzorkování.ObhájenoOver the past couple of years, machine learning methods - especially the Outlier Detection (OD) ones - have become anchored to the cyber security field to detect network-based anomalies rooted in novel attack patterns. Due to the steady increase of high-volume, high-speed and high-dimensional Streaming Data (SD), for which ground truth information is not available, detecting anomalies in real-world computer networks has become a more and more challenging task. Efficient detection schemes applied to networked, embedded devices need to be fast and memory-constrained, and must be capable of dealing with concept drifts when they occur. The aim of this thesis is to enhance computer network security through improved OD for data streams, in particular SD, to achieve cyber resilience, which ranges from the detection, over the analysis of security-relevant incidents, e.g., novel malicious activity, to the reaction to them. Therefore, four major contributions are proposed, which have been published or are submitted journal articles. First, a research gap in unsupervised Feature Selection (FS) for the improvement of off-the-shell OD methods in data streams is filled by proposing Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD. A generic concept is retrieved that shows two application scenarios of UFSSOD in conjunction with online OD algorithms. Extensive experiments have shown that UFSSOD, as an online-capable algorithm, achieves comparable results with a competitor trimmed for OD. Second, a novel unsupervised online OD framework called Performance Counter-Based iForest (PCB-iForest) is being introduced, which generalized, is able to incorporate any ensemble-based online OD method to function on SD. Two variants based on classic iForest are integrated. Extensive experiments, performed on 23 different multi-disciplinary and security-related real-world data sets, revealed that PCB-iForest clearly outperformed state-of-the-art competitors in 61 % of cases and even achieved more promising results in terms of the tradeoff between classification and computational costs. Third, a framework called Streaming Outlier Analysis and Attack Pattern Recognition, denoted as SOAAPR is being introduced that, in contrast to the state-of-the-art, is able to process the output of various online unsupervised OD methods in a streaming fashion to extract information about novel attack patterns. Three different privacy-preserving, fingerprint-like signatures are computed from the clustered set of correlated alerts by SOAAPR, which characterize and represent the potential attack scenarios with respect to their communication relations, their manifestation in the data's features and their temporal behavior. The evaluation on two popular data sets shows that SOAAPR can compete with an offline competitor in terms of alert correlation and outperforms it significantly in terms of processing time. Moreover, in most cases all three types of signatures seem to reliably characterize attack scenarios to the effect that similar ones are grouped together. Fourth, an Uncoupled Message Authentication Code algorithm - Uncoupled MAC - is presented which builds a bridge between cryptographic protection and Intrusion Detection Systems (IDSs) for network security. It secures network communication (authenticity and integrity) through a cryptographic scheme with layer-2 support via uncoupled message authentication codes but, as a side effect, also provides IDS-functionality producing alarms based on the violation of Uncoupled MAC values. Through a novel self-regulation extension, the algorithm adapts its sampling parameters based on the detection of malicious actions on SD. The evaluation in a virtualized environment clearly shows that the detection rate increases over runtime for different attack scenarios. Those even cover scenarios in which intelligent attackers try to exploit the downsides of sampling